'UTF_EXPECTED_LENGTH'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       icu
Subject:    UTF_EXPECTED_LENGTH
From:       "Zartaj T. Majeed" <zmajeed () adobe ! com>
Date:       2002-09-24 19:16:53
[Download RAW message or body]

No. UTF_CHAR_LENGTH takes a UChar32, a Unicode scalar value.
I have needed a function that will take a UTF-8 byte and tell
me how many more bytes should follow it for a complete character.
So something like:

#define UTF8_EXPECTED_LENGTH(uchar) \
((uchar) < 0x80? 1 : \
((uchar) & 0xe0 == 0xc0)? 2 : \
((uchar) & 0xf0 == 0xe0)? 3 : \
((uchar) & 0xf8 == 0xf0)? 4 : 0)

Zartaj

> Are you asking about UTF_CHAR_LENGTH (or UTF8_CHAR_LENGTH or 
> UTF16_CHAR_LENGTH)?  Right now, none of them return an error value, and 
> I'm not sure why an error value is needed (maybe if you had malformed 
> UTF-8).
> 
> George Rhoten
> IBM Globalization Center of Competency/ICU  San Jose, CA, USA
> 
> 
> 
> 
> "Zartaj T. Majeed" <zmajeed@adobe.com>
> Sent by: icu-admin@www-124.southbury.usf.ibm.com
> 09/24/2002 11:42 AM
> 
>  
>         To:     "icu list" <icu@www-124.southbury.usf.ibm.com>
>         cc: 
>         Subject:        RE: icu4c api proposal: simplify UTF macros
> 
>  
> 
> 
> Is there a macro that returns the expected length of a character
> given a code unit? I.e. the number of subsequent code units
> needed to form a valid character would be one less.
> For a single-unit character or the first unit of multi-unit character
> the macro would return the expected length . For any other code unit
> it would return an error value.
> 
> Thanks,
> Zartaj
> 
> ______________________________________________
> icu mailing list
> icu@oss.software.ibm.com
> http://oss.software.ibm.com/developerworks/oss/mailman/listinfo/icu
> 
> 
_______________________________________________
icu mailing list
icu@oss.software.ibm.com
http://oss.software.ibm.com/developerworks/oss/mailman/listinfo/icu
[prev in list] [next in list] [prev in thread] [next in thread]