On Sun, 28 May 2000, Waldo Bastian wrote: > On Sun, 28 May 2000, Alexei Dets wrote: > > > > I > > > > don't know any server or client internet software that works this > > > > way. URLs are always encoded using _local_ 8-bit encoding (and this > > > > is a known problem - you can't pass 8-bit characters in URL because > > > > local server and client encodings can be different and decoding will > > > > produce different results :-). > > > > > > That's why we don't use local encoding but use utf8. > > > See also http://www.w3.org/International/O-URL-and-ident and > > > http://www.ietf.org/rfc/rfc2718.txt > > > > But what can you say to users that want to visit Internet sites _now_? > > They _don't_ use UTF8 in URLs. Wait five or ten years until KDE2 can be > > used in Internet? ;-) This sucks big time. > I wanted to send all websites an e-mail telling them to change before we > release KDE 2.0 :-) > > Well, I think the problem is related to the fact that we decode and > re-encode the URL. I think rfc2718 advices against that. So I think that we > can solve the problem as follows: > When KURL is constructed from a real URL, we need to keep the URL around in > its original encoded form. When the URL is needed again, KURL::url() should > return the original encoding. When you make changes to the KURL, the > original encoding is removed. Which also means that we must change the interface to the IO-slaves because currently we don't pass a URL but the unicode path and query seperate. That has no chance of working because we don't always know which encoding to use when we send the path. By storing the original encoding in the URL we can use this encoding when we send the path into the world (e.g. with http or ftp) The URL will not always contain a (usefull) original encoding, e.g. when the user typed the URL it will be unicode encoded which we can't send to a remote server. For this case the user must specify which encoding the protocol (http, ftp, possibly others) should use. The local encoding is probably a good default for this. (Which makes you indeed wonder what the advantage is of using utf8 within KURL) Note that the basic problem remains, independent from the encoding used, because we can always get a URL with a different encoding and we must be able to pass such a URL along "as is". Cheers, Waldo