'RE: R: using non standard character with zerces'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xerces-c-dev
Subject:    RE: R: using non standard character with zerces
From:       "Jesse Pelton" <jsp () PKC ! com>
Date:       2005-09-16 17:04:51
Message-ID: 16E2027582CDB74180896CDB4B8CC1F9F15638 () PKCVT01 ! pkc ! com
[Download RAW message or body]

Darn.  Alberto beat me to the punch, with a typically lucid explanation.

I would just add that if you want to include the euro symbol, it's U+20AC in Unicode, \
not U+00A5 (which as Alberto pointed out is the yen symbol).  If you're using MSVC, \
you can include it in a text string as follows:

  XMLCh * pszText = L"Price: \x20AC" L"4000";

There are (I think) two interesting things going on here.  Most important, you can \
get away with this because MSVC supports "wide character" strings (designated with \
the L prefix).  These are UTF-16, and therefore interchangeable with XMLCh.  On other \
platforms, you may have to resort to nastier techniques.

Second, you have to split the string in this example into two chunks (which the \
compiler will then concatenate into one) in order to avoid the string "\x20AC4000", \
which looks to the compiler like you're trying to stuff a huge 4-byte value into a \
two-byte wide character.  Splitting the string allows the compiler to determine that \
you really only intend "\x20AC" to go into a single wide character before it \
concatenates the fragments.

> -----Original Message-----
> From: AESYS S.p.A. [Enzo Arlati] [mailto:enzo.arlati@aesys.it] 
> Sent: Friday, September 16, 2005 12:06 PM
> To: c-dev@xerces.apache.org
> Subject: R: R: using non standard character with zerces
> 
> But when can I include special character inside a node.
> I want to use the format &#xXX . but the '&' where processed 
> and translate
> in &amp; so the character &#xA5; whill be converted to 
> &amp;#xA5 instead of
> the desired current character entitity.
> thank for you help.
> 
> 
> -----Messaggio originale-----
> Da: Alberto Massari [mailto:amassari@datadirect.com]
> Inviato: giovedė 15 settembre 2005 12.10
> A: c-dev@xerces.apache.org
> Oggetto: Re: R: using non standard character with zerces
> 
> 
> Hi Enzo,
> 
> At 11.39 15/09/2005 +0200, AESYS S.p.A. [Enzo Arlati] wrote:
> 
> > I have a  source like this, which I read and parse :
> > 
> > <?xml version="1.0"  standalone="no" ?>
> > <Messaggio>
> > ...........
> > <Test1> start  &lt; &gt;  &amp; &#x28;  &#xA5; &#x29;  end  </Test1>
> > </Messaggio>
> > 
> > [...]
> > while if I use the function dom_wr->writeToString ( see 
> belowe ) I get an
> > empty string
> > 
> > chXml     = dom_wr->writeToString( *pDoc );
> > delete dom_wr;
> > delete errHandlerDomWriter;
> > 
> > sXml = XMLString::transcode( chXml );
> > sres = string( sXml );
> > XMLString::release( &sXml );
> > 
> > sres is EMPTY
> 
> The result of writeToString is a Unicode string;
> XMLString::transcode tries to convert that
> Unicode string into the local code page of your
> Linux box. I guess your current local code page
> is unable to represent the Unicode character 0xA5
> (the yen symbol). If you really want to display
> that Unicode string on your terminal, you should
> change your code page to be latin-1 or its equivalents (e.g. 
> ISO-8859-1)
> 
> Alberto
> 
> 
> > Do you known any idea about that ?
> > I'm using xerces 2.4 on redhat 7.3
> > 
> > 
> > 
> > 
> > 
> > DOCUMENT:
> > 
> > -----Messaggio originale-----
> > Da: AESYS S.p.A. [Enzo Arlati] [mailto:enzo.arlati@aesys.it]
> > Inviato: mercoledė 14 settembre 2005 14.24
> > A: c-dev@xerces.apache.org
> > Oggetto: using non standard character with zerces
> > 
> > 
> > 
> > I' m using a code like the one show belowe to build a DOM document.
> > 
> > DOMElement * pTestRef;
> > string stmp;
> > stmp = string( "this is a test: <> & &#165; " );
> > pTestRef = pDoc->createElement( X("TEST_REFERENCE_1") );
> > dtxt                  = pDoc->createTextNode( X( 
> stmp.c_str()));
> > pRoot->appendChild( pTestRef );
> > pTestRef->appendChild( dtxt );
> > 
> > The output I get is show belowe, where entities >,> and & a 
> re corretly
> > translated.
> > 
> > <?xml version="1.0" encoding="UTF-16" standalone="no" ?>
> > <Messaggio>
> > ......
> > <TEST_REFERENCE_1>this is a test: &lt;&gt; &amp; &amp;#165;
> > </TEST_REFERENCE_1>
> > </Messaggio>
> > 
> > What I can't do is to pass other entities in hex or decimal 
> notation like
> > &#165 for the euro character, because the first & which is 
> part of the
> whole
> > enitiy is translated separately.
> > How is possible to tell to DOMWriter to leave as is ( 
> without translate teh
> > & char ) entities composed by more characters ( like &#xA5;  
> , &#165; or
> > &lt; )
> > 
> > Regards,
> > Enzo Arlati
> > enzo.arlati@aesys.it
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
> > For additional commands, e-mail: c-dev-help@xerces.apache.org
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
> > For additional commands, e-mail: c-dev-help@xerces.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
> For additional commands, e-mail: c-dev-help@xerces.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
> For additional commands, e-mail: c-dev-help@xerces.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic