From kde-core-devel  Sat Jul 02 23:06:40 2011
From: Chusslove Illich <caslav.ilic () gmx ! net>
Date: Sat, 02 Jul 2011 23:06:40 +0000
To: kde-core-devel
Subject: Re: Translation in Qt5
Message-Id: <201107030106.40601.caslav.ilic () gmx ! net>
X-MARC-Message: https://marc.info/?l=kde-core-devel&m=130964815316690
MIME-Version: 1
Content-Type: multipart/mixed; boundary="--nextPart3400275.CdC9leTsQV"

--nextPart3400275.CdC9leTsQV
Content-Type: text/plain;
  charset="utf-8"
Content-Transfer-Encoding: quoted-printable

>> [: Chusslove Illich :]
>> * PO has to be natively supported. [...] The real advantage instead is in
>> the format and the process; and also tool support on translators' side.
>
> [: Oswald Buddenhagen :]
> how is the process fundamentally different from the linguist toolchain?
> why would the format matter for anything other than the tool support?

=46irst of all, this has to be looked from the other side -- why would one
replace PO (format, tools, process) with Linguist? If there would be indeed
no fundamental difference, then there is also no need to replace. To
consider replacing PO, Linguist (format, tools, process) would have to offer
some advantages, and I see none.

Having said that, here are the overall advantages of PO:

* PO extraction is unified for many programming languages (over 20) and even
environments within certain languages (KDE, Qt). I do not have to switch my
POT creation tool when working across these languages/environments.

* PO format is a low-fat highly-informative text format, perfectly manually
editable. The way I see it, an XML format is good only in two cases: one is
information exchange, where there is no intention to edit files manually;
the other is when you are making something new and do not dare come up with
a custom format, lest you make a mess with ad-hoc extensions in the future.
But to replace a well-established low-fat highly-informative text format
with a more-or-less faithfull XML representation of it, that is just a
no-go.

* PO merging is a critical element of the PO process, and I don't know any
chain other than Linguist which has merging defined at all. And for Linguist
I am not entirely sure that it has merging that well-defined; well, unless
you examined in detail what msgmerge does :)

* Third-party tool support for translators (repeating for completeness).
This includes not only general text editors (syntax highlighting, PO modes)
and specialized translation editors (like Lokalize or Qt Linguist), but also
summarizers (statistics, memories, glossaries...), checkers (syntax, style,
grammar...), and any "higher-order" tools (translation project organization,
review workflow, branch integration...).

So, I could now go and thoroughly test how l* tools behave compared to msg*
tools to see what details are lacking, and also carefully examine Linguist
DTD for where the format may be lacking, but I just see no point in
spending time on that. Rather, I expect you to tell me where they are
*better*. (And in sufficiently fundamental way, something that's not just a
matter of submitting a patch.) One thing in this respect you did propose:

> [...] qt tries to have no external tool dependencies, because they are a
> hassle on anything except linux. i'm also not sure how the requirement for
> using gpl'd tools would resonate with some of the proprietary qt
> customers.

I can't see how GPL would matter for them in this. If it's about a "fuzzy
bad feeling", I simply don't care.

But I do see how external tool dependencies matter, given that Qt was always
meant to be quite a self-contained solution. Therefore I have nothing
against keeping around l* tools for that purpose. But msg* tools have to be
an almost drop-in replacement for them (it's ok if precise command sequence
and options differ), for the projects that have no problem in using them.

>> [: Chusslove Illich :]
>> XLIFF is totally out of the question. [...]
>
> [: Oswald Buddenhagen :]
> presumably you mean "cheap" =3D> no plurals. i wouldn't know other
> shortcomings.

Not even going to go into this one :) (At the least all of the above
applies.)

>> [: Chusslove Illich :]
>> For one, translation scripting absolutely requires a public "translated
>> text" class (current KLocalizedString).
>
> [: Oswald Buddenhagen :]
> that sounds like no biggie. i don't see much point in a QStringFormatter
> unrelated to translations - it just seemed like a generalization bonus.
>
>> [: Chusslove Illich :]
>> it is also needed for dynamic context feature (see API doc of
>> KLocalizedString for this one). I could imagine that this alone counts
>> "heavy code-wise" -- does it?
>
> [: Oswald Buddenhagen :]
> uhm, sorry? a simple setter function "heavy"?

Erm, there was a semi-colon before that part, which should have made that
the whole sentence was the justification for... never mind :) What I meant
to say is that I was afraid you would consider introduction of
KLocalizedString equivalent heavy code-wise.

I actually thought you intended QStringFormatter to be unrelated to
translations. I'm not sure it is smart to relate it directly to
translations. E.g. what would indicate which string should be extracted and
which not? (KLocalizedString cannot be directly constructed, it has to go
through one of i18n*()/ki18n*() calls.) I think the proper chain would be
QStringTranslator -> QStringFormatter -> QString.

>> [: Chusslove Illich :]
>> And then comes the JavaScript wrapper in the background, file format
>> elements, language-specific scripting libraries, etc.
>
> [: Oswald Buddenhagen :]
> that doesn't matter as long as these are not link-time dependencies of
> qtcore. the point is that non-users should not be penalized by the mere
> presence of the feature.

That's the way it is now in kdecore...

>>> [: Oswald Buddenhagen :]
>>> for advanced formatting, i'm envisioning the syntax %[12.34h]1, i.e.,
>>> sprintf-like options in brackets.
>>
>> [: Chusslove Illich :]
>> I'm somehow converging to the opinion that this would not be good [...]
>
> [: Oswald Buddenhagen :]
> the use case is adjusting the format to available size, which is a very
> real problem on small devices.

Hm... as in, translator may need to steal a bit from the argument length in
order to make rest of the text fit? Something smells wrong there...

> i'm not sure how it could be of any significance for translators. they are
> dealing with printf formats outside the kde world all the time. usually
> one can simply consider the formats as atoms.

Yes, and I'm not too happy about it. I commented out some paragraphs from my
previous messages, but here they go:

In the "perfect text translation library" I would like that argument
placeholders are named and fully contained in mirror-character wrappers.
E.g. with braces and in Python, it could look like this:

  i18n("Notification from {appname}", appname=3D...)
  i18n("Allow access to {service} by {username}?", service=3D..., username=
=3D...)

I think that named placeholders make it easier on the programmer in multi-
placeholder messages, but more importantly, they provide immediate context
for the translator; both these messages would require explicit context
otherwise. Braces naturally separate placeholder from the text, so that
neither a man (a translator) nor a machine (a translation support tool)
is in doubt what is a placeholder (also highlighting rules are easier).
Argument formatting would be done outside, but if sprintf-style is really
desired, it can coexist as an option:

  i18n("Events on {date:<special_syntax_carnage>}", date=3DsomeDate)

It would be trivial to automatically convert existing KDE sources, since {1=
},
{2}... would be valid placeholders. The problem of C++ not having keyword
arguments, and then arbitrary number of them (Python's **kwargs, CLisp's
&rest) can be treated by either letting {1}, {2}... serve on[1], or perhaps
by falling back to method calls with, well, implicit conversion (see below).

[1] Another option: {1-appname}, where -<foo> is like a small inline
context, limited by convention to argument name.

My intention behind this "perfect text translation library" is that
*everything* could use it, providing bindings exist. I want it in a KDE app,
in a random C++ app (could I have it in C app too?), in a Python script, in
a JavaScript snippet, without having to change how i18n is done. In this
light, specially designed argument placeholders for i18n make that more
sense. Translators then no longer "feel" the underlying programming language
at all.

> the % operator is just a shorthand for subs(). together with an implicit
> QString conversion, it would fix some (minor) problems by removing the
> need for a wrapper function like i18n, which [...]

As I recall, the main reason for introducing function call syntax for most
cases plus method call syntax for special cases, was that I thought that
implicit conversion is dangerous. (Ok, also I very much like function call
syntax, but that alone wouldn't have been enough to make me go for it.) At
the time everything was i18n().arg().arg()... anyway, so the simplest route
would have been to keep it that way and add implicit conversion. But I no
longer remember why I thought implicit conversion is dangerous; and why
people didn't throw at me "don't be stupid, add implicit conversion". I'll
have to do some digging...

> some random differences between gettext() and tr():
>
> - tr() uses %n instead of the first generic integer parameter for
> identifying the plural form. i like it that way, because it's more
> explicit.

I don't like it that way because it introduces a special placeholder for no
particularly good reason. If you have only one integer, then there is no
difference. If you have more than one integer, then you normally have a much
bigger problem at hand than reduced explicitness. (Such message usually has
to be split, into as many partial plural messages as there are numbers, plus
one joiner message; all must have contexts explaining the split.)

In fact, normal Gettext (ngettext() call) is separately given a number which
decides plural, rather than looking through the arguments to be substituted
(well it couldn't examine arguments anyway, since it doesn't capture them).
This is both explicit and does not require a special placeholder. But I felt
it awkward to repeat the argument twice (once for plural decision, once for
placeholder), so I went with taking the first integer.

> - tr() has no plural support for the source language. this means that the
> messages are by definition degraded to elaborate ids
> [...]
> in practice, the need for an additional translation is somewhat annoying
> (and consequently neither qt nor creator have one). i'm yet to be
> convinced which approach is better.

I'm with you on the doubt :) (notwithstanding departure from the Gettext
way).

I have the following additional wory about "elaborate IDs": I wouldn't want
that programmers (those who know exactly what they meant) go and make a
meaning-changing fixes in English translation. Then translators to other
languages would not see those fixes. (You might say that other translations
can be automatically paired by tools with English translation, but than
we're effectively back to freeze breaks.)

> lupdate somewhat recently gained support for purely informative comments
> (//:, equivalent to //gettext:). these are also used for transmitting meta
> data (message ids, etc.). i kind of dislike this format, because it is
> detached from the actual c++ grammar. so i'm considering a dummy argument
> to qTr() instead:

On the contrary, I think the "detached" way is just fine. That is because
text and context form up the message key, and changing either leads to a new
or a fuzzy message for translators -- e.g. breaking a message freeze. Making
the purely informative comment an argument would make it appear far closer
in technical significance to text and context than it acutally is.

=2D-=20
Chusslove Illich (=D0=A7=D0=B0=D1=81=D0=BB=D0=B0=D0=B2 =D0=98=D0=BB=D0=B8=
=D1=9B)

--nextPart3400275.CdC9leTsQV
Content-Type: application/pgp-signature; name=signature.asc 
Content-Description: This is a digitally signed message part.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEABECAAYFAk4PpIAACgkQMSGXgigGr3Fg1wCgp1FMM8fpvIFIpDFySQlppxEG
v0QAmwacQZdFbEbQVgwtFUK8TWRC7hkk
=MNeD
-----END PGP SIGNATURE-----

--nextPart3400275.CdC9leTsQV--