[prev in list] [next in list] [prev in thread] [next in thread]
List: php-i18n
Subject: Re: [PHP-I18N] UTF-8 string validity detection
From: Moriyoshi Koizumi <moriyoshi () at ! wakwak ! com>
Date: 2003-05-13 18:58:12
[Download RAW message or body]
"Cestmir Hybl" <cestmir@nustep.net> wrote:
<snip>
> preg_match("/^($ptrASCII|$ptr2Octet|$ptr3Octet|$ptr4Octet|$ptr5Octet|$ptr6Oc
> tet)*$/s", $AStr);
> }
>
> but it tends to segfault on longer input (~10kB of text).
That sounds like a genuine bug. Could you file a new bug report with a
description how it segfaults (in what situation, etc..)
at http://bugs.php.net/ ? It's most likely a PCRE's problem..
> I've performed couple of tests and your solution seems to work fine, though
> there's no specification on how exactly mb_convert_encoding() behaves on
> incorrect input and how this may change in future. Stability of UTF-8 <->
> UCS-4 round trip seems to be guarantied in RFC 2279.
If you are in such concern, you might be better off using iconv() function
instead of mb_convert_encoding(), as the behaviour of iconv() is cleary
defined in the Single Unix Specifications.
http://www.opengroup.org/onlinepubs/007908799/xsh/iconv.html
Moriyoshi
--
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic