[prev in list] [next in list] [prev in thread] [next in thread]
List: kde-core-devel
Subject: Re: RFC - Separate filesystem encoding from document encoding
From: Michael Ritzert <kde () ritzert ! de>
Date: 2002-02-12 12:12:48
[Download RAW message or body]
Hi Ryan,
that's actually trivial: '/' is in ASCII and as such never appears in UTF-8
except in its original meaning. This is the case for all ASCII characters.
The reason this works is that all UTF-8 sequences that do not represent ASCII
characters have their high bit set. This follows directly from the definition
in RFC 2279:
UCS-4 range (hex.) UTF-8 octet sequence (binary)
0000 0000-0000 007F 0xxxxxxx
0000 0080-0000 07FF 110xxxxx 10xxxxxx
0000 0800-0000 FFFF 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-001F FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
0020 0000-03FF FFFF 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
0400 0000-7FFF FFFF 1111110x 10xxxxxx ... 10xxxxxx
Michael
Am Sonntag, 10. Februar 2002 20:19 schrieb Ryan Cumming:
> On February 10, 2002 h:41, Neil Stevens wrote:
> > KDE and Qt's multiple text codec support affords great flexibility
> > throughout KDE. Users can read and write documents in any encoding KDE
> > supports, viewing and using any character in unicode. But, for filenames
> > they are tied to the encoding of their own language.
> >
> > I think it'd be a good idea for the user's filesystem encoding to be
> > optionally different from his language encoding, specifically to allow
> > the use of unicode filenames.
> >
> > Comments?
>
> I'd like to see how you would guarantee the Unix path delimiter ("/")
> wouldn't show up in the UTF-8 stream.
>
> -Ryan
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic