[prev in list] [next in list] [prev in thread] [next in thread]
List: xerces-c-dev
Subject: RE: R: using non standard character with zerces
From: "Jesse Pelton" <jsp () PKC ! com>
Date: 2005-09-16 17:04:51
Message-ID: 16E2027582CDB74180896CDB4B8CC1F9F15638 () PKCVT01 ! pkc ! com
[Download RAW message or body]
Darn. Alberto beat me to the punch, with a typically lucid explanation.
I would just add that if you want to include the euro symbol, it's U+20AC in Unicode, \
not U+00A5 (which as Alberto pointed out is the yen symbol). If you're using MSVC, \
you can include it in a text string as follows:
XMLCh * pszText = L"Price: \x20AC" L"4000";
There are (I think) two interesting things going on here. Most important, you can \
get away with this because MSVC supports "wide character" strings (designated with \
the L prefix). These are UTF-16, and therefore interchangeable with XMLCh. On other \
platforms, you may have to resort to nastier techniques.
Second, you have to split the string in this example into two chunks (which the \
compiler will then concatenate into one) in order to avoid the string "\x20AC4000", \
which looks to the compiler like you're trying to stuff a huge 4-byte value into a \
two-byte wide character. Splitting the string allows the compiler to determine that \
you really only intend "\x20AC" to go into a single wide character before it \
concatenates the fragments.
> -----Original Message-----
> From: AESYS S.p.A. [Enzo Arlati] [mailto:enzo.arlati@aesys.it]
> Sent: Friday, September 16, 2005 12:06 PM
> To: c-dev@xerces.apache.org
> Subject: R: R: using non standard character with zerces
>
> But when can I include special character inside a node.
> I want to use the format &#xXX . but the '&' where processed
> and translate
> in & so the character ¥ whill be converted to
> &#xA5 instead of
> the desired current character entitity.
> thank for you help.
>
>
> -----Messaggio originale-----
> Da: Alberto Massari [mailto:amassari@datadirect.com]
> Inviato: giovedì 15 settembre 2005 12.10
> A: c-dev@xerces.apache.org
> Oggetto: Re: R: using non standard character with zerces
>
>
> Hi Enzo,
>
> At 11.39 15/09/2005 +0200, AESYS S.p.A. [Enzo Arlati] wrote:
>
> > I have a source like this, which I read and parse :
> >
> > <?xml version="1.0" standalone="no" ?>
> > <Messaggio>
> > ...........
> > <Test1> start < > & ( ¥ ) end </Test1>
> > </Messaggio>
> >
> > [...]
> > while if I use the function dom_wr->writeToString ( see
> belowe ) I get an
> > empty string
> >
> > chXml = dom_wr->writeToString( *pDoc );
> > delete dom_wr;
> > delete errHandlerDomWriter;
> >
> > sXml = XMLString::transcode( chXml );
> > sres = string( sXml );
> > XMLString::release( &sXml );
> >
> > sres is EMPTY
>
> The result of writeToString is a Unicode string;
> XMLString::transcode tries to convert that
> Unicode string into the local code page of your
> Linux box. I guess your current local code page
> is unable to represent the Unicode character 0xA5
> (the yen symbol). If you really want to display
> that Unicode string on your terminal, you should
> change your code page to be latin-1 or its equivalents (e.g.
> ISO-8859-1)
>
> Alberto
>
>
> > Do you known any idea about that ?
> > I'm using xerces 2.4 on redhat 7.3
> >
> >
> >
> >
> >
> > DOCUMENT:
> >
> > -----Messaggio originale-----
> > Da: AESYS S.p.A. [Enzo Arlati] [mailto:enzo.arlati@aesys.it]
> > Inviato: mercoledì 14 settembre 2005 14.24
> > A: c-dev@xerces.apache.org
> > Oggetto: using non standard character with zerces
> >
> >
> >
> > I' m using a code like the one show belowe to build a DOM document.
> >
> > DOMElement * pTestRef;
> > string stmp;
> > stmp = string( "this is a test: <> & ¥ " );
> > pTestRef = pDoc->createElement( X("TEST_REFERENCE_1") );
> > dtxt = pDoc->createTextNode( X(
> stmp.c_str()));
> > pRoot->appendChild( pTestRef );
> > pTestRef->appendChild( dtxt );
> >
> > The output I get is show belowe, where entities >,> and & a
> re corretly
> > translated.
> >
> > <?xml version="1.0" encoding="UTF-16" standalone="no" ?>
> > <Messaggio>
> > ......
> > <TEST_REFERENCE_1>this is a test: <> & &#165;
> > </TEST_REFERENCE_1>
> > </Messaggio>
> >
> > What I can't do is to pass other entities in hex or decimal
> notation like
> > ¥ for the euro character, because the first & which is
> part of the
> whole
> > enitiy is translated separately.
> > How is possible to tell to DOMWriter to leave as is (
> without translate teh
> > & char ) entities composed by more characters ( like ¥
> , ¥ or
> > < )
> >
> > Regards,
> > Enzo Arlati
> > enzo.arlati@aesys.it
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
> > For additional commands, e-mail: c-dev-help@xerces.apache.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
> > For additional commands, e-mail: c-dev-help@xerces.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
> For additional commands, e-mail: c-dev-help@xerces.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
> For additional commands, e-mail: c-dev-help@xerces.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic