[prev in list] [next in list] [prev in thread] [next in thread] 

List:       php-i18n
Subject:    Re: [PHP-I18N] UTF-8 string validity detection
From:       Moriyoshi Koizumi <moriyoshi () at ! wakwak ! com>
Date:       2003-05-13 18:58:12
[Download RAW message or body]

"Cestmir Hybl" <cestmir@nustep.net> wrote:
<snip>
> preg_match("/^($ptrASCII|$ptr2Octet|$ptr3Octet|$ptr4Octet|$ptr5Octet|$ptr6Oc
> tet)*$/s", $AStr);
> }
> 
> but it tends to segfault on longer input (~10kB of text).

That sounds like a genuine bug. Could you file a new bug report with a 
description how it segfaults (in what situation, etc..)
at http://bugs.php.net/ ? It's most likely a PCRE's problem..

> I've performed couple of tests and your solution seems to work fine, though
> there's no specification on how exactly mb_convert_encoding() behaves on
> incorrect input and how this may change in future. Stability of UTF-8 <->
> UCS-4 round trip seems to be guarantied in RFC 2279.

If you are in such concern, you might be better off using iconv() function 
instead of mb_convert_encoding(), as the behaviour of iconv() is cleary 
defined in the Single Unix Specifications.

http://www.opengroup.org/onlinepubs/007908799/xsh/iconv.html

Moriyoshi



-- 
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic