[prev in list] [next in list] [prev in thread] [next in thread]
List: xerces-c-dev
Subject: Re: How do I use Xerces strings?
From: David Bertoni <dbertoni () apache ! org>
Date: 2006-03-09 19:14:31
Message-ID: 44107E97.80108 () apache ! org
[Download RAW message or body]
Steven T. Hatton wrote:
> On Thursday 09 March 2006 12:16, David Bertoni wrote:
>> Steven T. Hatton wrote:
>
>>> wchar_t is 32 bits on my system. I believe that a 16 bit storage unit
>>> will under normal circumstances occupy a 32 bit memory location, but only
>>> use half of it.
>> Yes, and don't you think that's rather wasteful? Would you use Xerces-C
>> to process large XML documents if you knew it was wasting half of its
>> character string storage just so it could use wchar_t on all platforms?
>
> Actually, I did not state my intended meaning well, and I have now come to
> understand that I was in error. I was thinking in terms of individual units
> of storage, i.e., individual characters as opposed to containers. Containers
> (at least sequential containers) are basically arrays under the hood, so they
> do store data contiguously. I believe an individual 16-bit XMLCh will occupy
> 32-bits of storage, but that is probably a fairly rare animal, and therefore
> not worth consideration.
>
I guess I don't understand what you mean by "I believe an individual
16-bit XMLCh will occupy 32-bits of storage." How can a 16-bit XMLCh
ever occupy 32 bits of storage?
>>> Why does Xerces-C use a non-standard data type?
>> unsigned short is not a non-standard type. You may think it's
>> "non-standard" for holding character data, but Xerces-C encodes
>> character data in UTF-16 code units, and that requires a 16-bit integral
>> type.
>
> It is (AFAIK) not one of the datatypes supported by my Standard Library
> implementation. That is my point. I cannot seamlessly use it with the
> facilities provided by the C++ Standard Library.
I agree it's a big problem that you cannot use it with
std::basic_string, but there's no reason why you can't use it with the
the other containers. What other facilities do you want to use?
>
>>> If my implementation doesn't support a particular locale, and
>>>
>> > therefore does not use a 16 bit or larger data type, then what are the
>> > chances that I would use Xerces-C to support such a character set?
>>
>> You've got it backwards -- Xerces-C only support the current locale's
>> character set in a very limited fashion, by providing a way to transcode
>> UTF-16 strings to character strings in the current locale. Otherwise,
>> it operates internally exclusively in UTF-16, and it is unaffected by
>> the current locale or how the system encodes char or wchar_t.
>
> According to the standard, the C++ implementation must use a wchar_t large
> enough to hold all the characters used by that local. Combining that
> requirement with the requirement that implementation needs to support the
> character literals of the extended character set using the naming specified
> by ISO/IEC 10646:2000, I conclude that the requirement is virtually identical
> to the requirement that it support UTF. But I won't go so far as to say
> UTF-16.
>
UTF-16 is an encoding of the 10646/Unicode character set, and you've
stated previously that the C++ standard does not talk about encodings:
> The C++ Standard only specifies character sets. It does not specify
> encodings.
There is no requirement that a character specified with a universal
character name be encoded in any particular way -- it's just another way
to name a character.
My version of the standard also has this to say:
"If the hexadecimal value for a universal character name is less than
0x20 or in the range 0x7F-0x9F (inclusive), or if the universal
character name designates a character in the basic source character set,
then the program is ill-formed."
That restricts the usage of universal character names too severely for
Xerces-C's purposes.
Dave
---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic