'Unix paths as bytes'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-nio-dev
Subject:    Unix paths as bytes
From:       pjenvey () underboss ! org (Philip Jenvey)
Date:       2009-05-06 2:14:28
Message-ID: 35FCD914-63ED-41CB-82F8-129B4D3E4BCD () underboss ! org
[Download RAW message or body]


On May 4, 2009, at 11:24 PM, Martin Buchholz wrote:
>>
>> There's no case where 2 different sets of bytes would convert to  
>> the same
>> chars
>
> I don't understand this.  There are many locales with encodings with  
> non-unique
> representations.  Until the UTF-8 security reform,
> even UTF-8 had non-unique representations.
> The Python PEP seems designed to be used with
> any system encoding, not just UTF-8.

Ok, like ISO-2022-JP, ShiftJIS. These did come up in the PEP  
discussion on the python-dev ML.

They weren't highly regarded as they're pretty broken as Unix locales.  
The POSIX spec describes these "locking shift encodings" as fishy/ 
invalid for its character set [1] and they're incompatible with ASCII.  
RedHat, Debian and others disable them as locales by default.

These are indeed problematic, I guess they just weren't a deal breaker  
for the simpler scheme -- designed to be used with any system encoding  
that isn't annoying. The PEP mentions:

"Encodings that are not compatible with ASCII are not supported by  
this specification; bytes in the ASCII range that fail to decode will  
cause an exception. It is widely agreed that such encodings should not  
be used as locale charsets."

[1]: http://opengroup.org/onlinepubs/007908775/xbd/charset.html#tag_001_002

--
Philip Jenvey


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic