[prev in list] [next in list] [prev in thread] [next in thread] 

List:       php-i18n
Subject:    RE: [PHP-I18N] proposal: unification of the grapheme_extract functions
From:       "Texin, Tex" <Tex.Texin () netapp ! com>
Date:       2008-05-13 23:01:37
Message-ID: 819912BDAE6BCB4097883B226DA473B10A8467D3 () SACEXMV02 ! hq ! netapp ! com
[Download RAW message or body]

 

> -----Original Message-----
> From: Ed Batutis [mailto:ed@batutis.com] 
> 
> Sounds like a break iterator with a bit of extra info to 
> support multiple encodings. Or perhaps you mean to wrap all 
> string operations?
> 
> =Ed


I mean just for unicode, not other encodings, and yes to wrap all string ops, so that \
it can be maintained for any operations performed on the string.

I would maintain some info like is the string all ascii, are there any graphemes, \
etc. to use lowercost functions if possible, and some information about where eac \
character in the string begins for fast indexing on short strings. For longer \
strings, I might remember beginning of lines and their character and byte offsets. \
Also previous and next character info.

It would be something you might use on certain frequently used strings that are \
actually processed. PHP does a lot of just moving strings around and not parsing or \
modifying them so it isnt cost effective for all.

Just a thought for the future.

-- 
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic