From kde-core-devel Tue Sep 27 17:18:00 2005 From: Thiago Macieira Date: Tue, 27 Sep 2005 17:18:00 +0000 To: kde-core-devel Subject: Re: KURL problem Message-Id: <200509271418.10872.thiago () kde ! org> X-MARC-Message: https://marc.info/?l=kde-core-devel&m=112784151703454 MIME-Version: 1 Content-Type: multipart/mixed; boundary="--nextPart2661668.8vgEPhm3zy" --nextPart2661668.8vgEPhm3zy Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline David Faure wrote: >However the real question is: do you remember why you asked for whether > "%E1.foo" was a valid URL? It is parsed by the current KURL, but Thiago > tells me that this url doesn't have a defined interpretation. I guess we were trying to test =C3=A1.foo, but you changed it to %E1.foo to= =20 avoid encoding issues. Let me explain why %E1 is not valid in the hostname part: URLs are=20 supposed to be UTF-8 binary byte sequences. That means non-ASCII bytes=20 are supposed to be converted into characters when they form a valid UTF-8=20 sequence (e.g., %C3%A1), but invalid sequences are not supposed to be=20 discarded (e.g. %E1). However, hostnames are Unicode strings, so you can't have invalid high-bit= =20 sequences in UTF-8 representation (and they are impossible in UTF-16=20 representation). /usr/bin/idn behaves the same way: (run on UTF-8 environment) $ echo =C3=A1.foo | iconv -t latin1 | idn -a --quiet idn: idna_to_ascii_4z: String preparation failed BTW, Qt has for some time allowed those non-UTF8 sequences to be converted= =20 into QStrings and back, by using some reserved characters in UTF-16. =2D-=20 Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org PGP/GPG: 0x6EF45358; fingerprint: E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358 2. T=C3=B3 cennan his weorc gearu, ymbe se circolwyrde, wear=C3=B0 se c=C3= =A6gbord and se=20 leohtspeccabord, and =C3=BEa m=C3=BDs c=C3=B3mon lator. On =C3=BEone d=C3= =A6g, he hine reste. --nextPart2661668.8vgEPhm3zy Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQBDOX7SM/XwBW70U1gRAiy/AJ4qsJz2xX/gI8Og7jtA5h00Rh6VRwCdHk5F Q/y99fPwwf1mSgMU35wF1Xg= =prpo -----END PGP SIGNATURE----- --nextPart2661668.8vgEPhm3zy--