[prev in list] [next in list] [prev in thread] [next in thread]
List: freebsd-commits-all
Subject: Re: svn commit: r265095 - head/lib/libc/locale
From: Pedro Giffuni <pfg () freebsd ! org>
Date: 2014-04-30 21:43:20
Message-ID: 53616E78.3010301 () freebsd ! org
[Download RAW message or body]
On 04/30/14 16:10, Jilles Tjoelker wrote:
> On Tue, Apr 29, 2014 at 03:25:57PM +0000, Pedro F. Giffuni wrote:
>> Author: pfg
>> Date: Tue Apr 29 15:25:57 2014
>> New Revision: 265095
>> URL: http://svnweb.freebsd.org/changeset/base/265095
>> Log:
>> citrus: Avoid invalid code points.
>>
>> From the OpenBSD log:
>> The UTF-8 decoder should not accept byte sequences which decode to unicode
>> code positions U+D800 to U+DFFF (UTF-16 surrogates), U+FFFE, and U+FFFF.
>> http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
>> http://unicode.org/faq/utf_bom.html#utf8-4
>> Reported by: Stefan Sperling
>> Obtained from: OpenBSD
>> MFC after: 5 days
>> Modified:
>> head/lib/libc/locale/utf8.c
>> Modified: head/lib/libc/locale/utf8.c
>> ==============================================================================
>> --- head/lib/libc/locale/utf8.c Tue Apr 29 15:12:23 2014 (r265094)
>> +++ head/lib/libc/locale/utf8.c Tue Apr 29 15:25:57 2014 (r265095)
>> @@ -203,6 +203,14 @@ _UTF8_mbrtowc(wchar_t * __restrict pwc,
>> errno = EILSEQ;
>> return ((size_t)-1);
>> }
>> + if ((wch >= 0xd800 && wch <= 0xdfff) ||
>> + wch == 0xfffe || wch == 0xffff) {
>> + /*
>> + * Malformed input; invalid code points.
>> + */
>> + errno = EILSEQ;
>> + return ((size_t)-1);
>> + }
>> if (pwc != NULL)
>> *pwc = wch;
>> us->want = 0;
> Hmm, I think U+FFFE and U+FFFF should be passed through normally.
> According to http://www.unicode.org/faq/private_use.html they are
> "noncharacters" (basically a more private variant of private-use
> characters) and must be mapped through UTFs.
>
> The part that rejects U+D800 to U+DFFF is definitely correct, though.
> http://unicode.org/faq/utf_bom.html#utf8-4 tells to do only that.
>
> The part about U+FFFE and U+FFFF in
> http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 seems out of date.
> Note the last modified date of that page: 2009-05-11.
>
> On another note, everything above U+0010FFFF should perhaps be rejected
> since those codes, which cannot be encoded in UTF-16, were excluded from
> Unicode and ISO 10646.
>
Thank you! I will fix soon the UTF-8 part.
Pedro.
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic