From kde-core-devel Thu Jul 07 09:48:40 2011 From: Chusslove Illich Date: Thu, 07 Jul 2011 09:48:40 +0000 To: kde-core-devel Subject: Re: Translation in Qt5 Message-Id: <201107071148.46939.caslav.ilic () gmx ! net> X-MARC-Message: https://marc.info/?l=kde-core-devel&m=131003239219020 MIME-Version: 1 Content-Type: multipart/mixed; boundary="--nextPart17977826.f5rsXsiQoZ" --nextPart17977826.f5rsXsiQoZ Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable > [: Oswald Buddenhagen :] > so in summary, you think we *should* go for semantic highlighting if the > problems can be adequately solved, yes? That is the summary, yes. But what I would want to make it adequate may be too heavyweight, while the lightweight solutions you propose I don't consider adequate: >> [: Chusslove Illich :] >> First, some people really didn't like that semantic markup was thrown >> onto everyone [...] > > [: Oswald Buddenhagen :] > i'm not overly concerned. i'm sure somebody will complain if i make the > new formatter reject %blubb (requiring %%blubb) instead of silently passi= ng > it literally, despite this being a very sane thing to do. I know you are just giving an example, but escaping formatting directives is a tiny thing to think about, and the expected thing in string formatting engines, compared to contextually-handled markup. I really want the possibility to disable markup handling. I only wonder how to do it exactly, and what should be the default. >> [: Chusslove Illich :] >> The second problem is escaping and substitution. [...] > > [: Oswald Buddenhagen :] > the answer is the per-placeholder possibiliy to disable auto-quoting: > qTr("foo: %1 is %q2") % foo % bar; > ("q" as in "pre-Quoted") Having to remember to prevent escaping is certainly better than having to remember to do escaping, but it does not solve the more fundamental problem. Markup should be resolved *at the very end*, when the final text composed text is sent to output device (an UI widget, standard output, etc). Recall my example with "... ... ..." insert string being resolved too early. Also, I very much don't like no-escaping being done through formatting directive. If formatting directives are to be used, than at the very least they should not contain anything that translators absolutely must not change. >> [: Chusslove Illich :] >> The third problem [...] fixed semantic markup, there is the problem of >> set of tags. [...] > > [: Oswald Buddenhagen :] > that's easy ... > qTr("Uses of %1 from %2
").markup("strong") % n= odes[1] % nodes[0]; > did you have something like that in mind? If you didn't have second thoughts when writing this, shame on you :) This means that the programmer cannot define custom markup in one place, that it is directly linked to output format (no semantics), and that translator can= not modify this definition (e.g. some will want to avoid bold text, and many will need to change that which resolves into quotes...). * * * I would like the following. =46irst of all, for the moment completely forget about translation. The programmer should have a facility to define custom markup, which can contextually resolve into various output formats, and use this markup to build up user-visible text. This markup does not really have to be semantic (and that cannot be enforced at any rate); the programmer may also go for visuals like , , ... which then resolves into HTML, shell sequences, or whatever. The markup must be definable in one place, so that it can be used all over the application/library. It should be defined which tags, with which attributes, resolve into what, by needed output format. Something like (see below for QUITComposer): QUITComposer::setTag("important", QUITFormat::Rich, "%1"= ); QUITComposer::setTag("important", QUITFormat::Term, "\033[1m%1\033[0m"); ... Then the example above can be written just as it was originally, only shifted to markup instead of piecing up strings: qTr("Uses of %1 from %2")... There would also be the possibility to specialize tags by attributes, and to set formatter functions instead of plain substitution strings. There should be a standalone class which does this. I don't know how to name it exactly, but it should not be named *String, because it really is not a string as such. It is rather a UI text composer and markup transformer. So let's call it QUITComposer. The only methods of this class should be argument substitution and resolver to QString (and/or implicit conversion), and possibly some other special methods. Consider now: int line; ... QString filename; ... QUITComposer problem =3D qtc("%1 does not exit.").subs(filename); // ...markup not resolved, qtc() shorthand for QUITComposer(), // programmer chose because was too long, // filename was automatically escaped. ... QUITComposer report =3D qtc("Error in line %1: %2") .subs(line).subs(problem); // ...markup not resolved yet, line number is automatically converted // to string according to locale (and then escaped if necessary); // problem is not escaped because it is a QUITComposer object. ... showInGui(report); // ...markup resolution happens *somewhere* here. writeToStdout(report); writeToLogfile(report); In an ideal world, all of showInGui(), writeToStdout(), writeToLogfile() can take QUITComposer as argument, and then inside they do report.toString(QUITFormat::Rich), report.toString(QUITFormat::Term), report.toString(QUITFormat::Plain), respectively.[1] [1] This is what I did in that Python code I mentioned. There is a single module with all the reporting functions, and each can take composer text. While output destination is always stdout or file, it detects whether it is a TTY to use shell colors, or the user may globally force HTML output so that he can insert it directly to a web page. In the world as we have it, there should be a mechanism for explicit format selection. If necessary it can be explicit: showInGui(report.toString(Format::Rich)); There can also be shortcuts, .toRich(), etc. If the programmer knows there is only one type of destination for the given string, selection can also be at place of definition: QUITComposer report =3D qtcRich("Error in line %1: %2")... where qtcRich() would be a shorthand for QUITComposer(...).setFormat(QUITFormat::Rich). Or if the whole code can mostly use one format, then also available: QUITComposer::setDefaultFormat(QUITFormat::Rich); ... QUITComposer report =3D qtc("Error in line %1: %2")... ... showInGui(report); // rich on implicit conversion showInGui(report.toString()); // also rich, explicit conversion writeToLogfile(report.toString(QUITFormat::Plain)); // explicit to plain writeToLogfile(report.toPlain()); // explicit to plain, short version When QUITComposer::toString() hits, only then are all QUITComposer objects that were substituted as arguments resolved themselves, recursively. All assume the target format of the topmost QUITComposer. If the programmer cared to define nesting constraints, those too are checked for the whole composition (e.g. that you haven't substituted a inside a <para>), and warnings/spoofs/escapes produced on errors. Internally, actually the final raw string is first composed, with original markup, and then markup resolver runs over it. This enables nesting constraints and proper interactions (e.g. <emphasis>Blah, blah <emphasis>blah</emphasis> blah blah.</emphasis>). There would also be QUITComposer::toRaw() method, which simply ignores markup, so toString() is implemented as: QString QUITComposer::toString (QUITFormat fmt) { QStringList rawargs; // ... // Resolve stored arguments, using toRaw() on those that are QUITComp= oser. // ... QString raw =3D m_raw; // own raw string with placeholders // ... // Substitue rawargs into raw. // ... // Resolve markup in raw according to fmt, doing all the checks. // ... return final; } Now we come back to translation. It fits snugly on top of this. Let's call the class QUITTranslator. It would inherit QUITComposer, because it too is, in effect, a UI text transformer. In particular, it would override the QUITComposer::toString() (which would be virtual in QUITComposer), so that it can translate its own raw string before substituting placeholders. Since QUITComposer delays argument resolution, QUITTranslator also intercepts them (rawargs above) and delivers them to the JavaScript-scripted translation, if there is one. When QUITTranslator is used, and destination outputs do not take QUITComposer, the programmer may also opt to specify target formats in the context, through context marker. Then QUITTranslator would internally call setFormat() based on what it parsed from the context. E.g: QUITTranslator::setFormatByContext("@info", "", QUITFormat::Rich); QUITTranslator::setFormatByContext("@info", "progress", QUITFormat::Plain= ); ... QUITTranslator report =3D qtr("@info", "Error in line %1: %2") .subs(line).subs(problem); // ...qtr() is shorthand for QUITTranslator(), // but it is also the only way to create a QUITTranslator object. ... showInGui(report); // rich on implicit conversion, due to @info showInGui(report.toString()); // also rich due to @info, explicit writeToLogfile(report.toPlain()); // can still override, of course or one can also have it one go: QString report =3D qtr("@info", "Error in line %1: %2") .subs(line).subs(problem); // ...produces a rich text QString outright, implicit conversion. If implicit conversion is not provided (still open on that one, I'll get to it), then again there can be a set of templates like i18n*(), say fqtr*() (f* standing for "function call syntax"). =46inally we come to disabling/enabling markup. If it is disabled, then QUITComposer, and by consequence QUITTranslator, just ignores markup[2]. Markup can obviously be disabled on each individual QUITComposer object, but I'm not happy with that solution. First, irrespective of translation, it is absurd to do that in each place in the code if markup is nowhere wanted. (Note that there is a point in using QUITComposer even if markup is fully disabled, due to its placeholder substitution; I'll get to that too in another reply.) So a global switch is necessary. [2] And therefore Pino will not want to bite my head off any more. Unfortunatelly, translation complicates this further, because with translation in picture, per-instance disabling should not be allowed at all. It should be only "global" in the sense of "within same translation domain" (this is PO terminology, in practice means "within the PO file of certain base name"). This is in order to allow translators to use markup themselves, when the programmer didn't use it, e.g: msgid "File '%1' does not exit." msgstr "Datoteka <filename>%1</filename> ne postoji." =46or this to be possible, two things have to hold. One is that it must be certain that all messages in the given PO file can use QUIT markup (this would be indicated through the X-Markup: PO header field). Hence the necessity to enable/disable markup only on the per-domain basis. The second is that there has to be a default set of tags, which is not a problem. This set of tags can be arbitrarily large (e.g. it can include pure visual tags), which is not a problem of the type I mentioned before, since programmers can define their own tags too instead of having to pick among not-quite-what-I- need from the default set. The per-domain switch is problematic because, more fundamentally, we do not have concept of per-domain currently in KDE i18n. All loaded PO files within the process form a "single namespace", which is causing bugs of the type that one library contains the message "Sun" as in short of "Sunday", and another the message "Sun" as in the star, and then... Yes, short message should be equipped with contexts anyway, but the amount of these problems is increasing rather than subduing with time. (Not having done something about this in KDE 3->4 round I consider my biggest blunder in that effort.) Pure Gettext has dngettext*() function, so one puts into a private library header file something like: #define DOMAIN "foobar" #define _(msgid) dgettext(DOMAIN, msgid) #define p_(msgctxt, msgid) dgettextp(DOMAIN, msgctxt, msgid) ... Ordinary gettext*() calls look only into the PO domain set by bindtextdomain() (this is ok for use in applications, though not particularly helpful, since shorthand macros are always defined). I haven't checked, how is this currently handled in Linguist system? Going back to defining markup, when translations are in the picture, parts of definition have to be exposed to translators. But this is easy: QUITComposer::setTag("important", QUITFormat::Rich, qtr("!uimarkup:<important>/rich", "<strong>%1</strong>").toSelfRaw()); QUITComposer::setTag("important", QUITFormat::Rich, qtr("!uimarkup:<important>/plain", "*%1*").toSelfRaw()); Context tells that this is a markup expansion definition. Currently in KUIT I used more terse "@important/rich" but it should be more explicit. Unlike QUITComposer::toRaw() which returns the raw text with all arguments recursively substituted, QUITComposer::toSelfRaw() returns only its own raw string, no substitutions. Messages such as this will allow both the translator to change the formating, but also tools to dig out defined tags and use them for validation (adding them to the set of default tags); this is why the context here is not free, but should have a specification of its own. That is basically what I had in mind about markup. =2D-=20 Chusslove Illich (=D0=A7=D0=B0=D1=81=D0=BB=D0=B0=D0=B2 =D0=98=D0=BB=D0=B8= =D1=9B) --nextPart17977826.f5rsXsiQoZ Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEABECAAYFAk4VgP4ACgkQMSGXgigGr3HbGACeLMtOm/x4lJf4uCmpro85g0j7 i7AAniR0CC/xrpqG3PCCI0UyobDW16nD =M1VZ -----END PGP SIGNATURE----- --nextPart17977826.f5rsXsiQoZ--