[prev in list] [next in list] [prev in thread] [next in thread] 

List:       icu
Subject:    RE: Use of U+FFFF as the "error value"
From:       Yves Arrouye <yves () realnames ! com>
Date:       2002-03-19 17:01:13
[Download RAW message or body]

> New APIs: I am tempted to say that new APIs should return int32_t, not
> even UChar32, and that -1 should be used there as a "no character at all"
> value.
> The check for whether there is a code point or a "I ran into the iteration
> bounds" is a simple and fast sign check.
> This would also be easy for Java (which does not have an unsigned 32-bit
> type).

The issue with that is that if you ask for UChar32 as input, but return
int32_t, there will be a lot of casting going on between these two types to
keep the compilers quiet. Returning ((UChar32) -1) avoids that.

> UCharIterator:
> I am getting second thoughts on this API that I made public just a few
> days ago.
> I propose to change it as follows:
>
[...]
> 
> - change the return value of current(), next(), previous()
>    from UChar to int32_t as described above,
>    keep returning code _units_ 0..ffff, or -1 for "none"

Argh. I don't want to push to UChar32 here because UChar32 is consistently
used to think of a code point. And these APIs return code units. But then
int32_t makes a third data type appear in the APIs for these times when we
need to have the "error value".

Now, if UChar32 became signed, it would be possible to assign an int32_t to
it without compiler complaints, it would also be consistent with Java, and
in C one could use the different names int32_t and UChar32 to refer to
different situations (code unit + error value or code point + error value).
(In Java, it will be tough to know when an int32_t is a code unit versus a
code point, but that's a limitation of the language).

In any case, it's not going to be the prettiest thing to handle all these
cases.

YA


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic