[prev in list] [next in list] [prev in thread] [next in thread]
List: php-i18n
Subject: RE: [PHP-I18N] proposal: unification of the grapheme_extract functions
From: "Texin, Tex" <Tex.Texin () netapp ! com>
Date: 2008-05-13 23:01:37
Message-ID: 819912BDAE6BCB4097883B226DA473B10A8467D3 () SACEXMV02 ! hq ! netapp ! com
[Download RAW message or body]
> -----Original Message-----
> From: Ed Batutis [mailto:ed@batutis.com]
>
> Sounds like a break iterator with a bit of extra info to
> support multiple encodings. Or perhaps you mean to wrap all
> string operations?
>
> =Ed
I mean just for unicode, not other encodings, and yes to wrap all string ops, so that \
it can be maintained for any operations performed on the string.
I would maintain some info like is the string all ascii, are there any graphemes, \
etc. to use lowercost functions if possible, and some information about where eac \
character in the string begins for fast indexing on short strings. For longer \
strings, I might remember beginning of lines and their character and byte offsets. \
Also previous and next character info.
It would be something you might use on certain frequently used strings that are \
actually processed. PHP does a lot of just moving strings around and not parsing or \
modifying them so it isnt cost effective for all.
Just a thought for the future.
--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic