[prev in list] [next in list] [prev in thread] [next in thread]
List: openjdk-nio-dev
Subject: Unix paths as bytes
From: pjenvey () underboss ! org (Philip Jenvey)
Date: 2009-05-06 2:14:28
Message-ID: 35FCD914-63ED-41CB-82F8-129B4D3E4BCD () underboss ! org
[Download RAW message or body]
On May 4, 2009, at 11:24 PM, Martin Buchholz wrote:
>>
>> There's no case where 2 different sets of bytes would convert to
>> the same
>> chars
>
> I don't understand this. There are many locales with encodings with
> non-unique
> representations. Until the UTF-8 security reform,
> even UTF-8 had non-unique representations.
> The Python PEP seems designed to be used with
> any system encoding, not just UTF-8.
Ok, like ISO-2022-JP, ShiftJIS. These did come up in the PEP
discussion on the python-dev ML.
They weren't highly regarded as they're pretty broken as Unix locales.
The POSIX spec describes these "locking shift encodings" as fishy/
invalid for its character set [1] and they're incompatible with ASCII.
RedHat, Debian and others disable them as locales by default.
These are indeed problematic, I guess they just weren't a deal breaker
for the simpler scheme -- designed to be used with any system encoding
that isn't annoying. The PEP mentions:
"Encodings that are not compatible with ASCII are not supported by
this specification; bytes in the ASCII range that fail to decode will
cause an exception. It is widely agreed that such encodings should not
be used as locale charsets."
[1]: http://opengroup.org/onlinepubs/007908775/xbd/charset.html#tag_001_002
--
Philip Jenvey
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic