[prev in list] [next in list] [prev in thread] [next in thread]
List: xerces-c-dev
Subject: Re: How do I use Xerces strings?
From: "Steven T. Hatton" <hattons () globalsymmetry ! com>
Date: 2006-03-11 1:20:24
Message-ID: 200603102020.24308.hattons () globalsymmetry ! com
[Download RAW message or body]
On Friday 10 March 2006 00:05, David Bertoni wrote:
> Steven T. Hatton wrote:
> > On Thursday 09 March 2006 19:33, David Bertoni wrote:
> >> Steven T. Hatton wrote:
> >>> What is the CPU going to stick in the other 16 bits of a 32 bit word
> >>> when it stores a single XMLCh?
> >>
> >> We must be talking about two different things, because I'm talking about
> >> an array of 16-bit integrals, so no 32-bit units of storage are
> >> involved.
> >
> > That is why I explicitly referred to individual XMLCh values as opposed
> > to sequential containers.
>
> I'm not aware of any architecture that stores a 16-bit scalar value in
> 32 bits, but I suppose there might be one.
i386 (32-bit version), i486, P, PII, PIII, P4...
#include <iostream>
int main() {
char c('c');
std::cout<<c<<std::endl;
}
Assume char is 8-bits. The smallest retrievable unit of storage is a 32-bit
word. That means the CPU puts c in a 32-bit word. What will occupy the
other 24 bits of the word?
> > The ranges in question appear to be explicitly set asside for certain
> > purposes, or intentionally unspecified by the Unicode Standard. In some
> > cases these "characters" overlap with specific ASCII control characters,
> > and can be expressed using the existing C++ character literal
> > representations. In the cases where the C++ Standard does not explicitly
> > specify basic character set representations, even in a fully UTF
> > compliant implementation, there would be no guarantee required of the
> > implementation to allow you to use those encodings.
>
> Do you mean to "use those characters," rather than "use those encodings?"
The encodings. I am specifically talking about a hypothetical implementation
that uses exactly what is required to implement UTF-16 without any remapping
of code points.
> > IOW, you may need those values, but UTF does not give them to you.
>
> UTF-what doesn't "give them to you?" Since they are Unicode code
> points, they can be encoded in UTF-8, UTF-16, or UTF-32.
And the result of doing so is implementation defined.
> > I mean character encodings which require more than one 16-bit unit of
> > storage.
>
> Do you mean characters whose encoding(s) in UTF-16 require more than one
> 16-bit unit of storage? "Character encodings which require more than
> one 16-bit unit of storage" sounds like you're talking about generic
> encoding schemes that use may use multiple code units, and not UTF-16 in
> particular.
I mean Unicode UTF-16.
> > That is basically my question. Is there much real cost in using UTF-16 as
> > opposed to UTF-32. The impression I'm getting is that UTF-16 may well be
> > the better choice for the vast majority of applications.
>
> It's the age-old space vs. time trade-off, as far as I can see.
What I want to know is under what conditions the costs will be incurred.
Steven
---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic