[prev in list] [next in list] [prev in thread] [next in thread]
List: perl5-porters
Subject: Re: RFC: Processing Unicode non-characters and code points beyond Unicode's
From: SADAHIRO Tomoyuki <bqw10602 () nifty ! com>
Date: 2010-11-30 14:38:49
Message-ID: 20101130233849.57C3.CB027F2D () nifty ! com
[Download RAW message or body]
On Thu, 25 Nov 2010 10:23:17 -0700
karl williamson wrote:
> I think we have agreement here, but let me sum up to be sure.
>
> 1) The current API will change (because it doesn't really have the
> capability to do things properly) so that by default the internal utf8
> encoding/decoding functions will allow non-character code points and
> above-Unicode code points. The default for surrogates will continue to
> be that they are not allowed. It will be possible to specify
> disallowing non-characters and beyond-Unicode characters by appropriate
> flags. (Actually, the current API for utf8n_to_uvuni() always allows
> above-Unicode code points; I would extend it to allow excluding these.)
> Existing macros that match subsets of the non-character code points
> will be removed and replaced by a single macro with a new name that
> matches all of them.
Though I don't object defining a new flag macro that makes
utf8n_to_uvuni() will disallow beyond-Unicode (uv >= 0x110000)
and, if necessary, changing the flags passed to utf8n_to_uvuni()
called in perl core,
I guess removal of any existing macro, that has been long-standing
since perl 5.7.x or around 5.8.0, has a problem of backward compatibity.
The removal of an existing macro makes any XS code using the macro
can't be built.
The API doc for utf8n_to_uvuni() in perl 5.12.2 (latest maint)
states (see http://perldoc.perl.org/perlapi.html#utf8n_to_uvuni )
If s does not point to a well-formed UTF-8 character,
the behaviour is dependent on the value of flags :
[snip]
The flags can also contain various flags to allow
deviations from the strict UTF-8 encoding (see utf8.h).
UV utf8n_to_uvuni(const U8 *s, STRLEN curlen,
STRLEN *retlen, U32 flags)
and then this document seems to allow for perl users to use the macros
defined in utf8.h as flags passed to utf8n_to_uvuni().
Regards,
SADAHIRO Tomoyuki
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic