From kde-core-devel Sat Jul 02 23:06:40 2011 From: Chusslove Illich Date: Sat, 02 Jul 2011 23:06:40 +0000 To: kde-core-devel Subject: Re: Translation in Qt5 Message-Id: <201107030106.40601.caslav.ilic () gmx ! net> X-MARC-Message: https://marc.info/?l=kde-core-devel&m=130964815316690 MIME-Version: 1 Content-Type: multipart/mixed; boundary="--nextPart3400275.CdC9leTsQV" --nextPart3400275.CdC9leTsQV Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable >> [: Chusslove Illich :] >> * PO has to be natively supported. [...] The real advantage instead is in >> the format and the process; and also tool support on translators' side. > > [: Oswald Buddenhagen :] > how is the process fundamentally different from the linguist toolchain? > why would the format matter for anything other than the tool support? =46irst of all, this has to be looked from the other side -- why would one replace PO (format, tools, process) with Linguist? If there would be indeed no fundamental difference, then there is also no need to replace. To consider replacing PO, Linguist (format, tools, process) would have to offer some advantages, and I see none. Having said that, here are the overall advantages of PO: * PO extraction is unified for many programming languages (over 20) and even environments within certain languages (KDE, Qt). I do not have to switch my POT creation tool when working across these languages/environments. * PO format is a low-fat highly-informative text format, perfectly manually editable. The way I see it, an XML format is good only in two cases: one is information exchange, where there is no intention to edit files manually; the other is when you are making something new and do not dare come up with a custom format, lest you make a mess with ad-hoc extensions in the future. But to replace a well-established low-fat highly-informative text format with a more-or-less faithfull XML representation of it, that is just a no-go. * PO merging is a critical element of the PO process, and I don't know any chain other than Linguist which has merging defined at all. And for Linguist I am not entirely sure that it has merging that well-defined; well, unless you examined in detail what msgmerge does :) * Third-party tool support for translators (repeating for completeness). This includes not only general text editors (syntax highlighting, PO modes) and specialized translation editors (like Lokalize or Qt Linguist), but also summarizers (statistics, memories, glossaries...), checkers (syntax, style, grammar...), and any "higher-order" tools (translation project organization, review workflow, branch integration...). So, I could now go and thoroughly test how l* tools behave compared to msg* tools to see what details are lacking, and also carefully examine Linguist DTD for where the format may be lacking, but I just see no point in spending time on that. Rather, I expect you to tell me where they are *better*. (And in sufficiently fundamental way, something that's not just a matter of submitting a patch.) One thing in this respect you did propose: > [...] qt tries to have no external tool dependencies, because they are a > hassle on anything except linux. i'm also not sure how the requirement for > using gpl'd tools would resonate with some of the proprietary qt > customers. I can't see how GPL would matter for them in this. If it's about a "fuzzy bad feeling", I simply don't care. But I do see how external tool dependencies matter, given that Qt was always meant to be quite a self-contained solution. Therefore I have nothing against keeping around l* tools for that purpose. But msg* tools have to be an almost drop-in replacement for them (it's ok if precise command sequence and options differ), for the projects that have no problem in using them. >> [: Chusslove Illich :] >> XLIFF is totally out of the question. [...] > > [: Oswald Buddenhagen :] > presumably you mean "cheap" =3D> no plurals. i wouldn't know other > shortcomings. Not even going to go into this one :) (At the least all of the above applies.) >> [: Chusslove Illich :] >> For one, translation scripting absolutely requires a public "translated >> text" class (current KLocalizedString). > > [: Oswald Buddenhagen :] > that sounds like no biggie. i don't see much point in a QStringFormatter > unrelated to translations - it just seemed like a generalization bonus. > >> [: Chusslove Illich :] >> it is also needed for dynamic context feature (see API doc of >> KLocalizedString for this one). I could imagine that this alone counts >> "heavy code-wise" -- does it? > > [: Oswald Buddenhagen :] > uhm, sorry? a simple setter function "heavy"? Erm, there was a semi-colon before that part, which should have made that the whole sentence was the justification for... never mind :) What I meant to say is that I was afraid you would consider introduction of KLocalizedString equivalent heavy code-wise. I actually thought you intended QStringFormatter to be unrelated to translations. I'm not sure it is smart to relate it directly to translations. E.g. what would indicate which string should be extracted and which not? (KLocalizedString cannot be directly constructed, it has to go through one of i18n*()/ki18n*() calls.) I think the proper chain would be QStringTranslator -> QStringFormatter -> QString. >> [: Chusslove Illich :] >> And then comes the JavaScript wrapper in the background, file format >> elements, language-specific scripting libraries, etc. > > [: Oswald Buddenhagen :] > that doesn't matter as long as these are not link-time dependencies of > qtcore. the point is that non-users should not be penalized by the mere > presence of the feature. That's the way it is now in kdecore... >>> [: Oswald Buddenhagen :] >>> for advanced formatting, i'm envisioning the syntax %[12.34h]1, i.e., >>> sprintf-like options in brackets. >> >> [: Chusslove Illich :] >> I'm somehow converging to the opinion that this would not be good [...] > > [: Oswald Buddenhagen :] > the use case is adjusting the format to available size, which is a very > real problem on small devices. Hm... as in, translator may need to steal a bit from the argument length in order to make rest of the text fit? Something smells wrong there... > i'm not sure how it could be of any significance for translators. they are > dealing with printf formats outside the kde world all the time. usually > one can simply consider the formats as atoms. Yes, and I'm not too happy about it. I commented out some paragraphs from my previous messages, but here they go: In the "perfect text translation library" I would like that argument placeholders are named and fully contained in mirror-character wrappers. E.g. with braces and in Python, it could look like this: i18n("Notification from {appname}", appname=3D...) i18n("Allow access to {service} by {username}?", service=3D..., username= =3D...) I think that named placeholders make it easier on the programmer in multi- placeholder messages, but more importantly, they provide immediate context for the translator; both these messages would require explicit context otherwise. Braces naturally separate placeholder from the text, so that neither a man (a translator) nor a machine (a translation support tool) is in doubt what is a placeholder (also highlighting rules are easier). Argument formatting would be done outside, but if sprintf-style is really desired, it can coexist as an option: i18n("Events on {date:}", date=3DsomeDate) It would be trivial to automatically convert existing KDE sources, since {1= }, {2}... would be valid placeholders. The problem of C++ not having keyword arguments, and then arbitrary number of them (Python's **kwargs, CLisp's &rest) can be treated by either letting {1}, {2}... serve on[1], or perhaps by falling back to method calls with, well, implicit conversion (see below). [1] Another option: {1-appname}, where - is like a small inline context, limited by convention to argument name. My intention behind this "perfect text translation library" is that *everything* could use it, providing bindings exist. I want it in a KDE app, in a random C++ app (could I have it in C app too?), in a Python script, in a JavaScript snippet, without having to change how i18n is done. In this light, specially designed argument placeholders for i18n make that more sense. Translators then no longer "feel" the underlying programming language at all. > the % operator is just a shorthand for subs(). together with an implicit > QString conversion, it would fix some (minor) problems by removing the > need for a wrapper function like i18n, which [...] As I recall, the main reason for introducing function call syntax for most cases plus method call syntax for special cases, was that I thought that implicit conversion is dangerous. (Ok, also I very much like function call syntax, but that alone wouldn't have been enough to make me go for it.) At the time everything was i18n().arg().arg()... anyway, so the simplest route would have been to keep it that way and add implicit conversion. But I no longer remember why I thought implicit conversion is dangerous; and why people didn't throw at me "don't be stupid, add implicit conversion". I'll have to do some digging... > some random differences between gettext() and tr(): > > - tr() uses %n instead of the first generic integer parameter for > identifying the plural form. i like it that way, because it's more > explicit. I don't like it that way because it introduces a special placeholder for no particularly good reason. If you have only one integer, then there is no difference. If you have more than one integer, then you normally have a much bigger problem at hand than reduced explicitness. (Such message usually has to be split, into as many partial plural messages as there are numbers, plus one joiner message; all must have contexts explaining the split.) In fact, normal Gettext (ngettext() call) is separately given a number which decides plural, rather than looking through the arguments to be substituted (well it couldn't examine arguments anyway, since it doesn't capture them). This is both explicit and does not require a special placeholder. But I felt it awkward to repeat the argument twice (once for plural decision, once for placeholder), so I went with taking the first integer. > - tr() has no plural support for the source language. this means that the > messages are by definition degraded to elaborate ids > [...] > in practice, the need for an additional translation is somewhat annoying > (and consequently neither qt nor creator have one). i'm yet to be > convinced which approach is better. I'm with you on the doubt :) (notwithstanding departure from the Gettext way). I have the following additional wory about "elaborate IDs": I wouldn't want that programmers (those who know exactly what they meant) go and make a meaning-changing fixes in English translation. Then translators to other languages would not see those fixes. (You might say that other translations can be automatically paired by tools with English translation, but than we're effectively back to freeze breaks.) > lupdate somewhat recently gained support for purely informative comments > (//:, equivalent to //gettext:). these are also used for transmitting meta > data (message ids, etc.). i kind of dislike this format, because it is > detached from the actual c++ grammar. so i'm considering a dummy argument > to qTr() instead: On the contrary, I think the "detached" way is just fine. That is because text and context form up the message key, and changing either leads to a new or a fuzzy message for translators -- e.g. breaking a message freeze. Making the purely informative comment an argument would make it appear far closer in technical significance to text and context than it acutally is. =2D-=20 Chusslove Illich (=D0=A7=D0=B0=D1=81=D0=BB=D0=B0=D0=B2 =D0=98=D0=BB=D0=B8= =D1=9B) --nextPart3400275.CdC9leTsQV Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iEYEABECAAYFAk4PpIAACgkQMSGXgigGr3Fg1wCgp1FMM8fpvIFIpDFySQlppxEG v0QAmwacQZdFbEbQVgwtFUK8TWRC7hkk =MNeD -----END PGP SIGNATURE----- --nextPart3400275.CdC9leTsQV--