--nextPart1298461.SW5veP0k7t Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Zack Rusin wrote: >Hey, > >as was pointed out a while ago, one of the reasons QUrl was rewritten >was that it was supposed to replace KURL in KDE 4. Good. That's a start. KURL has turned into a mess now, and that requires=20 some cleaning. Not that KURL doesn't work. But, as a central component, it has to be very= =20 lean. I'd also like to emphasize that KURL does not pass the IRI tests. We'll=20 have to test QUrl to make sure it does. Another thing is that KURL is wrongly named: it deals with URIs, not just=20 URLs. >> - convert a hostname back from ACE automatically (fromPunycode is >> never called) > >That's a bug; I've made a task of it. Of course QUrl should call >fromPunycode :-). Good. Just make sure we can get both "forms" of the URL: the presentation=20 form (ToUnicode) and the internal form (ToASCII). Currently=20 KURL::prettyURL also converts %20 into spaces, and the printable high=20 characters are decoded. About decoding: URLs are always UTF-8. By the way, the "proper" names for the IDNA transformation are ToUnicode=20 and ToASCII. Punycode is a Unicode encoding, just like UTF-7, UTF-8,=20 UTF-16 or UCS-4. Punycode would probably be better named if it were a=20 QTextCodec (I'm not sure if it's been assigned a MIB number). =46or instance, my full name (Thiago Jos=C3=A9 Macieira) is encoded in Puny= code=20 as: Thiago Jos Macieira-kzb whereas if I applied ToASCII, it would come out as: xn--thiago jos macieira-kzb Note the lowercasing and the xn- prefix. In other words: ToASCII =3D nameprep + punycode + "xn-" prefix. >> - handle URL-looking non-URLs (example:=20 ed2k://|file|Ugly_looking[file]name|343928602| 00000000000000000000000000000000|/) > >QUrl follows the URI specification in this respect. QUrl does the right >thing in rejecting it. If KUrl accepted this stuff, then that's really >bad. Unfortunately, that's required. Those URLs are in use, even by a KDE=20 application (kmldonkey). What exactly does QUrl do to a rejected URL? Refuse to parse completely?=20 Or does it try to transform in any way? What KURL did was parse the thing=20 between // and / as a hostname, which meant applying ToASCII to that part=20 and, thus, breaking the filename and hash. If QUrl simply refuses to do anything with it, it would help. It would be=20 better if it did what KURL does now: recognise it as a broken URL-looking=20 URI and not do anything after the ed2k: part. >> QUrl has a strict parser >> >>> Apparently, QUrl accepts file:/path URLs (no ///) > >I don't understand this. Both file:/path and file:///path are allowed >according to the spec. Which one does it generate? The other-desktop developers will yell at us=20 if we start generating file:/path URLs again. >> Warning: Verify that the extra folding mandated by IDNA is done! It >> does QUnicodeTables::normalize(labels.at(i), >> QString::NormalizationForm_KC, QChar::Unicode_3_1), but IDNA requires >> more than NFKC. > >If so, then this has changed in the specification. If we do not do >exaclty what the spec does, then this is a bug. The relevant spec here is Nameprep (RFC 3491). It is but a profile of=20 Stringprep (RFC 3454). What Nameprep does is: =2D NFKC (which means =C3=9F becomes ss, combining diacriticals are joined = to the=20 letter) =2D case-folding (=3Dlowercasing) =2D additional folding of homographs (like turning the =C2=B5 symbol into t= he=20 Greek lowercase letter =CE=BC [they may look the same, but they are not]) Also note that this step is likely to be changed soon by new RFCs, given=20 the homograph issues of two months ago. The Nameprep profile may change,=20 as well as the upgrading to Unicode 4.0 tables. By the way, it may be useful to expose the Nameprep routine. And I don't=20 think it belongs in QUrl. In KDE code, it's in the resolver (KResolver).=20 I'd like to avoid duplication: if Qt provides it, there's no need to link=20 to libidn. >> - manipulate the special query "charset" (called fileEncoding) > >This _can_ be added to QUrl, but will probably not be. Agreed. This kind of manipulation shouldn't be in the class, but on some=20 kind of external manipulator. Don't pollute the class interface with=20 unnecessary functions. >> - convert non-ASCII hostnames to IDN, including in mailto: URIs > >The spec says nothing about what comes after mailto:. So you are free to >call toPunyCode() and fromPunyCode() when generating mailto urls. True. mailto: isn't a URL, so QUrl doesn't have to handle it. It is, however, a valid URI and, as a URI parser, KURL did handle it. So=20 it would be nice if QUrl (or, maybe, QUri) handled it as well. Currently, the part after the @ is supposed to accept IDNs -- so you could= =20 send me an email to thiago@jos=C3=A9.macieira. info, if it got parsed into= =20 thiago @ xn--jos-dma.macieira.info. In the future, the part before the @=20 will be internationalised as well. KURL has 4 modes of operation, depending on the "URI mode": full URL=20 compliance, mailto URIs, raw URI and invalid. What's more, there's URN as=20 well. KURL doesn't handle them, but it is a feature that has been asked.=20 An URN parser got posted to kde-core-devel a while ago. Maybe it is the case of having a proper superclass that can be specialised= =20 into QUrl. >> - deal with "sub-URLs", which are hardcoded. See >> http://bugs.kde.org/show_bug.cgi?id=3D3D73821=3D20 > >Suburls should be are gone in KDE 4, you can talk to David Faure about >this. We already agreed on this point. Suburls break the URI >specification, and are practically unused and totally confusing. Agreed. Out with them. There are some interesting ideas being discussed on=20 http://bugs.kde.org/show_bug.cgi?id=3D73821 and 102265. My contribution to the discussion would be to leave the processing=20 entirely to KIO, with a special "multi" ioslave. One of the forms I=20 proposed, and that I find the cleanest, would require however a URI, not=20 a URL. Example: multi:http://localhost/~thiago/archive.zip,zip:/dirname/filename.gz,gzip:/ Meaning: decompress dirname/filename.gz from the zip archive=20 http://localhost/~thiago/archive.zip. =2D-=20 Thiago Macieira - thiago (AT) macieira (DOT) info PGP/GPG: 0x6EF45358; fingerprint: E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358 4. And =C3=A6fter se scieppend ingelogode, he wr=C3=A1t "cenn", ac eala! se= =20 rihtendgesamnung andswarode "cenn: ne w=C3=A1t h=C3=BA cennan 'eall'. =C3= =81stynt." --nextPart1298461.SW5veP0k7t Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQBCjPloM/XwBW70U1gRAhejAKCpPgbbD/NtIRgyyZUBTxTHKSDJOACfSzp8 Zmattsk9P0PCNhNtrxhUhoI= =aYEc -----END PGP SIGNATURE----- --nextPart1298461.SW5veP0k7t--