[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xerces-c-dev
Subject:    Re: How do I use Xerces strings?
From:       "Steven T. Hatton" <hattons () globalsymmetry ! com>
Date:       2006-03-11 1:20:24
Message-ID: 200603102020.24308.hattons () globalsymmetry ! com
[Download RAW message or body]

On Friday 10 March 2006 00:05, David Bertoni wrote:
> Steven T. Hatton wrote:
> > On Thursday 09 March 2006 19:33, David Bertoni wrote:
> >> Steven T. Hatton wrote:

> >>> What is the CPU going to stick in the other 16 bits of a 32 bit word
> >>> when it stores a single XMLCh?
> >>
> >> We must be talking about two different things, because I'm talking about
> >> an array of 16-bit integrals, so no 32-bit units of storage are
> >> involved.
> >
> > That is why I explicitly referred to individual XMLCh values as opposed
> > to sequential containers.
>
> I'm not aware of any architecture that stores a 16-bit scalar value in
> 32 bits, but I suppose there might be one.

i386 (32-bit version), i486, P, PII, PIII, P4...
#include <iostream>
int main() {
  char c('c');
  std::cout<<c<<std::endl;
}

Assume char is 8-bits.  The smallest retrievable unit of storage is a 32-bit 
word.  That means the CPU puts c in a 32-bit word.  What will occupy the 
other 24 bits of the word?

> > The ranges in question appear to be explicitly set asside for certain
> > purposes, or intentionally unspecified by the Unicode Standard.  In some
> > cases these "characters" overlap with specific ASCII control characters,
> > and can be expressed using the existing C++ character literal
> > representations. In the cases where the C++ Standard does not explicitly
> > specify basic character set representations, even in a fully UTF
> > compliant implementation, there would be no guarantee required of the
> > implementation to allow you to use those encodings.
>
> Do you mean to "use those characters," rather than "use those encodings?"

The encodings.  I am specifically talking about a hypothetical implementation 
that uses exactly what is required to implement UTF-16 without any remapping 
of code points.

> > IOW, you may need those values, but UTF does not give them to you.
>
> UTF-what doesn't "give them to you?"  Since they are Unicode code
> points, they can be encoded in UTF-8, UTF-16, or UTF-32.

And the result of doing so is implementation defined.

> > I mean character encodings which require more than one 16-bit unit of
> > storage.
>
> Do you mean characters whose encoding(s) in UTF-16 require more than one
> 16-bit unit of storage?  "Character encodings which require more than
> one 16-bit unit of storage" sounds like you're talking about generic
> encoding schemes that use may use multiple code units, and not UTF-16 in
> particular.

I mean Unicode UTF-16.

> > That is basically my question. Is there much real cost in using UTF-16 as
> > opposed to UTF-32.  The impression I'm getting is that UTF-16 may well be
> > the better choice for the vast majority of applications.
>
> It's the age-old space vs. time trade-off, as far as I can see.

What I want to know is under what conditions the costs will be incurred.  

Steven

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic