'R: R: R: R: using non standard character with zerces'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xerces-c-dev
Subject:    R: R: R: R: using non standard character with zerces
From:       "AESYS S.p.A. [Enzo Arlati]" <enzo.arlati () aesys ! it>
Date:       2005-09-20 12:33:57
Message-ID: 002e01c5bddf$927624a0$dac8a8c0 () enzo
[Download RAW message or body]

I understand my solution have one main limit, in that it can deal only with
code between 0 and ff,
Bigger code will be truncate.
But now I'm working mainly with standard charset ( not unicode ), I only
need transcoding from 8 to 16 bit when I work which xerces.
I don't think , my solution should led some problem, at least for now, but
some advice for improvment are always welcome. 

BTW I'm using gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-110).



-----Messaggio originale-----
Da: Jesse Pelton [mailto:jsp@PKC.com]
Inviato: lunedė 19 settembre 2005 19.56
A: c-dev@xerces.apache.org; enzo.arlati@aesys.it
Oggetto: RE: R: R: R: using non standard character with zerces


You'd use the XMLCh array (xmlStr in my example) in your calls to, for
example, createTextNode().  It's just a cumbersome but portable way to
create a string of characters in Xerces' internal format.  Xerces uses the
standardized UTF-16 encoding to represent characters internally, so XMLCh is
required to be (at least) a 2-byte (16-bit) type.  Some compilers (like
Microsoft's) have a native string type that is an exact match.  With such a
compiler, this:

   XMLCh* xmlStr = L"(\xA5)";

is equivalent (for our purposes) to:

   XMLCh xmlStr[] = { '(', 0xA5, ')', chNull };

So, with either xmlStr, you could make a call like:

  dtxt = pDoc->createTextNode(xmlStr);

This would create a text node with a parenthesized yen symbol, which you
could then insert into the document.

If your compiler does not have an internal string format that matches
Xerces', XMLCh is typically defined as an unsigned short on the assumption
that it will be a two-byte type.  (This is the case for GCC.  wchar_t could
in theory be used, but it's a four-byte type, which is wasteful for most
documents.)  There's no string notation for integral types, hence the
necessity to use the cumbersome array notation to create your XMLCh strings.

Your modifications to XStr do not look safe to me.  You appear to be
assuming that simple copies from one form to another will suffice, which
effectively removes the transcoding that is the primary purpose of the
class.  You can get away with this sometimes, but it won't work in the
general case.  The fact that defeating the transcoding makes things appear
to work lends support to Alberto's hypothesis that your current local code
page is unable to represent one or more of the characters that you want to
transcode.  Consequently, the transcoding fails with the original XStr.

I'd avoid transcoding altogether unless you know precisely why you're doing
it and what will happen.  Specifically, the XStr class is too simple-minded
to handle the text you're giving it.  It happens to work for the ASCII text
in the sample apps, but it's not really general.  If you want to continue to
use it, you should probably enhance it to transcode to UTF-16 rather than
the local code page, since UTF-16 is what Xerces is expecting.

I'd strongly recommend using XMLCh arrays for any literal strings instead.

Note that I could be wrong about some of this; I trust that Alberto or
someone else will point out any errors.

["winmail.dat" (application/ms-tnef)]

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic