[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-frameworks-devel
Subject:    Re: Review Request 128243: Fix parsing of env vars in KLocalizedStringPrivateStatics::initializeLoca
From:       Chusslove Illich <caslav.ilic () gmx ! net>
Date:       2016-06-26 21:08:52
Message-ID: 20160626210852.5107.62403 () mimi ! kde ! org
[Download RAW message or body]

--===============0305899511823480279==
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit



> On Јун 19, 2016, 1:11 пре п., Chusslove Illich wrote:
> > a) Right, old code is obviously wrong (sigh).
> > 
> > b) According to Gettext convention, LANGUAGE is a fine-tuning variable and it \
> > (each of its colon-separated parts) is to be taken as-is, without the system \
> > trying to be smart.
> 
> Friedrich W. H. Kossebau wrote:
> a) possibly stayed undetected as there might be no locales in real world which have \
> both modifier and charset set. Still is better to have the code straight, less \
> confusing for anyone who might come to read it (like me :) ). 
> b) My (recently gained) theoretic knowledge about the LANGUAGE env var is mainly \
> from ABOUT-NLS, and there it says "'LL_CC' combinations can be abbreviated as 'LL' \
> to denote the language's main dialect.". From which I had guessed that e.g. both \
> "LANGUAGE=ru_RU:foo:bar" and "LANGUAGE=ru:foo:bar" would result in the \
> LOCALEPREFIX/ru/LC_MESSAGES/" catalogs being picked up (at least when there are no \
> ru_RU of course). Being a newbie on this field, I miss real-world experience surely \
> and just can talk from what I understood so far and see on my system. 
> Now, I have not yet got to understanding the code in the gettext lib, but from \
> testing on commandline "LANGUAGE=ru_RU:de:en" will result in the catalogs from the \
> ru folder being used e.g. for the coreutils tool "ls": ```
> LANG=de_DE.UTF-8 LC_MESSAGES= LC_ALL= LANGUAGE=ru_RU.UTF-8:de:en ls --help
> ```
> (yes, even using charset here, to show that even that is dealt with when set, \
> though perhaps just ignored) 
> Doing the same with a ki18n app does not work:
> ```
> LANG=de_DE.UTF-8 LC_MESSAGES= LC_ALL= LANGUAGE=ru_RU.UTF-8:de:en okteta
> 
> ```
> will with the current code not result in russian translations in "ru" folder being \
> used for ki18n-based translations, but the "de_DE" on. More, other than ki18n, the \
> Qt system locale seems to pick up the LANGUAGE setting and goes fur russian, at \
> least QLocale::system(), as used by kf5 code for picking the language to use with \
> QTranslator, results in russian catalogs ("*_ru.qm") being loaded and used. Can be \
> seen e.g. in the menu string "Show Toolbar" (in "Settings" menu, from \
> kconfig5_qt.po) or the print dialog, where most string are from Qt lib catalog. \
> Which results in a language mix in the UI. 
> Only when using just the language code "ru", matching exactly the folder name with \
> the catalog, this will give me also russian translations for anything based on \
> ki18n: ```
> LANG=de_DE.UTF-8 LC_MESSAGES= LC_ALL= LANGUAGE=ru:de:en okteta
> ```
> 
> So at least on my openSUSE Tumbleweed system both the gettext lib and the Qt locale \
> code seem to have a different treatment of the LANGUAGE env var than what the \
> current ki18n code does. No idea/experience if that is a problem in the real world \
> though with real world usage of values for the LANGUAGE var. Just hit this issues \
> during naive/clueless translation handling testing and playing around with all the \
> vars. Still, with my current knowledge I would feel better to align the behaviour \
> of ki18n more with the others. 
> Friedrich W. H. Kossebau wrote:
> Possibly things need even more fixing. So values in the list set for "LANGUAGE" \
> seem to be not only taken as they are but instead are also tried in stripped \
> variants (so "ru" catalogs are taken for "LANGUAGE=ru_RU") from what I see. But \
> that comment from gettext's ABOUT-NLS I cited earlier ("'LL_CC' combinations can be \
> abbreviated as 'LL' to denote the language's main dialect.") might also mean ki18n \
> needs to check for a catalog with the language's main dialect, in case there is \
> none for for just the language itself? 
> Question which currently prevents me from understanding more: how to know the \
> language's main dialect? Is that "ll_LL", so the same letters as used for the \
> country as used for the language? (I learned that LANG has to have the country \
> always set in the given code, so it is not a problem there) 
> So with "LANGUAGE=ll" if there is no "ll/LC_MESSAGES/foo.mo" would we also need to \
> check for "ll_LL/LC_MESSAGES/foo.mo"? 
> Friedrich W. H. Kossebau wrote:
> The code at least of the gnu implementation seems to not make a different handling \
> of the values for LANGUAGE over the values for LANG, LC_MESSAGES or LC_ALL. Is that \
> a GNU exception of the rule "to be taken as-is"? But than the whole var LANGUAGE is \
> a GNU extension from what I read. 
> I followed KCatalog::translate -> dgettext -> dcgettext -> dcigettext. There \
> `guess_category_value()` is called to get the value from the env vars  (see \
> https://code.woboq.org/gcc/intl/dcigettext.c.html#565). Which adheres to the known \
> preferences, choosing LANGUAGE over LANG, LC_MESSAGES or LC_ALL (latter three cared \
> for by `__current_locale_name()` or `_nl_locale_name(...)`, where at least the \
> latter also just takes the values as given, without further processing). The \
> returned value then is processed further without making a difference whether it \
> came from LANGUAGES or the other vars and passes all of them (split by ":" char) to \
> `_nl_find_domain()` in the same way. 
> So both by behaviour and by how I understand the linked implementation, the patch \
> as proposed here should be an improvement :) Please tell me what I missed otherwise \
> which should stop this patch.

At this point, you checked everything that there was to check, so I can only conclude \
that my understanding of the semantics of LANGUAGE was lacking. So the patch is just \
fine.

(As for the "main dialect", I wouldn't try to be yet more smart there, but leave it \
as it is. In particular, there are many cases where language code and "main" country \
code for that language are different, so getting ll and trying also ll_LL would not \
work all the time.)


- Chusslove


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/128243/#review96702
-----------------------------------------------------------


On Јун 18, 2016, 10:36 по п., Friedrich W. H. Kossebau wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/128243/
> -----------------------------------------------------------
> 
> (Updated Јун 18, 2016, 10:36 по п.)
> 
> 
> Review request for KDE Frameworks and Chusslove Illich.
> 
> 
> Repository: ki18n
> 
> 
> Description
> -------
> 
> This patch fixes two things:
> 
> a) `splitLocale(...)` had the wrong order of splitting off modifier and codeset: \
> the old code was first splitting off anything behind "." as charset (which would \
> include the modifier though if present) and only then seeing to split off anything \
> behind "@" as modifier. So with both charset and modifier present this would fail. 
> b) The locales listed in "LANGUAGE" would be, other than those in "LC_ALL", \
> "LC_MESSAGES" and "LANG", only added as they are to the list of `localeLanguages`, \
> without generating their variants. That seems unbalanced to me, as it would mean \
> KCatalog not properly detecting mo files e.g. in \
> "/usr/share/locale/ru/LC_MESSAGES/" subfolder with "LANGUAGE=ru_RU.UTF-8", "LANG=", \
> "LC_ALL=", "LC_MESSAGES=". Or is that on purpose? 
> 
> Diffs
> -----
> 
> src/klocalizedstring.cpp fc80135 
> 
> Diff: https://git.reviewboard.kde.org/r/128243/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Friedrich W. H. Kossebau
> 
> 


--===============0305899511823480279==
MIME-Version: 1.0
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: 8bit




<html>
 <body>
  <div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
   <table bgcolor="#f9f3c9" width="100%" cellpadding="12" style="border: 1px #c9c399 \
solid; border-radius: 6px; -moz-border-radius: 6px; -webkit-border-radius: 6px;">  \
<tr>  <td>
      This is an automatically generated e-mail. To reply, visit:
      <a href="https://git.reviewboard.kde.org/r/128243/">https://git.reviewboard.kde.org/r/128243/</a>
  </td>
    </tr>
   </table>
   <br />





<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <p style="margin-top: 0;">On јун 19th, 2016, 1:11 пре п. CEST, \
<b>Chusslove Illich</b> wrote:</p>  <blockquote style="margin-left: 1em; border-left: \
2px solid #d0d0d0; padding-left: 10px;">  <pre style="white-space: pre-wrap; \
white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; \
word-wrap: break-word;">a) Right, old code is obviously wrong (sigh).

b) According to Gettext convention, LANGUAGE is a fine-tuning variable and it (each \
of its colon-separated parts) is to be taken as-is, without the system trying to be \
smart.</pre>  </blockquote>




 <p>On јун 20th, 2016, 5:09 по п. CEST, <b>Friedrich W. H. Kossebau</b> \
wrote:</p>  <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; \
padding-left: 10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; \
white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;"><p \
style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: \
inherit;">a) possibly stayed undetected as there might be no locales in real world \
which have both modifier and charset set. Still is better to have the code straight, \
less confusing for anyone who might come to read it (like me :) ).</p> <p \
style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: \
inherit;">b) My (recently gained) theoretic knowledge about the LANGUAGE env var is \
mainly from ABOUT-NLS, and there it says "'LL_CC' combinations can be abbreviated as \
'LL' to denote the language's main dialect.". From which I had guessed that e.g. both \
"LANGUAGE=ru_RU:foo:bar" and "LANGUAGE=ru:foo:bar" would result in the \
LOCALEPREFIX/ru/LC_MESSAGES/" catalogs being picked up (at least when there are no \
ru_RU of course). Being a newbie on this field, I miss real-world experience surely \
and just can talk from what I understood so far and see on my system.</p> <p \
style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: \
inherit;">Now, I have not yet got to understanding the code in the gettext lib, but \
from testing on commandline "LANGUAGE=ru_RU:de:en" will result in the catalogs from \
the ru folder being used e.g. for the coreutils tool "ls":</p> <p style="padding: \
0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;"><div \
class="codehilite" style="background: #f8f8f8"><pre style="line-height: \
125%">LANG=de_DE.UTF-8 LC_MESSAGES= LC_ALL= LANGUAGE=ru_RU.UTF-8:de:en ls --help \
</pre></div> </p>
<p style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: inherit;">(yes, even using charset here, to show that even that \
is dealt with when set, though perhaps just ignored)</p> <p style="padding: \
0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">Doing \
the same with a ki18n app does not work:</p> <p style="padding: 0;text-rendering: \
inherit;margin: 0;line-height: inherit;white-space: inherit;"><div class="codehilite" \
style="background: #f8f8f8"><pre style="line-height: 125%">LANG=de_DE.UTF-8 \
LC_MESSAGES= LC_ALL= LANGUAGE=ru_RU.UTF-8:de:en okteta </pre></div>
</p>
<p style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: inherit;">will with the current code not result in russian \
translations in "ru" folder being used for ki18n-based translations, but the "de_DE" \
on. More, other than ki18n, the Qt system locale seems to pick up the LANGUAGE \
setting and goes fur russian, at least QLocale::system(), as used by kf5 code for \
picking the language to use with QTranslator, results in russian catalogs ("*_ru.qm") \
being loaded and used. Can be seen e.g. in the menu string "Show Toolbar" (in \
"Settings" menu, from kconfig5_qt.po) or the print dialog, where most string are from \
Qt lib catalog. Which results in a language mix in the UI.</p>
<p style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: inherit;">Only when using just the language code "ru", matching \
exactly the folder name with the catalog, this will give me also russian translations \
for anything based on ki18n:</p> <p style="padding: 0;text-rendering: inherit;margin: \
0;line-height: inherit;white-space: inherit;"><div class="codehilite" \
style="background: #f8f8f8"><pre style="line-height: 125%">LANG=de_DE.UTF-8 \
LC_MESSAGES= LC_ALL= LANGUAGE=ru:de:en okteta </pre></div>
</p>
<p style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: inherit;">So at least on my openSUSE Tumbleweed system both the \
gettext lib and the Qt locale code seem to have a different treatment of the LANGUAGE \
env var than what the current ki18n code does. No idea/experience if that is a \
problem in the real world though with real world usage of values for the LANGUAGE \
var. Just hit this issues during naive/clueless translation handling testing and \
playing around with all the vars. Still, with my current knowledge I would feel \
better to align the behaviour of ki18n more with the others.</p></pre>  </blockquote>





 <p>On јун 20th, 2016, 6:50 по п. CEST, <b>Friedrich W. H. Kossebau</b> \
wrote:</p>  <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; \
padding-left: 10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; \
white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;"><p \
style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: \
inherit;">Possibly things need even more fixing. So values in the list set for \
"LANGUAGE" seem to be not only taken as they are but instead are also tried in \
stripped variants (so "ru" catalogs are taken for "LANGUAGE=ru_RU") from what I see. \
But that comment from gettext's ABOUT-NLS I cited earlier ("'LL_CC' combinations can \
be abbreviated as 'LL' to denote the language's main dialect.") might also mean ki18n \
needs to check for a catalog with the language's main dialect, in case there is none \
for for just the language itself?</p> <p style="padding: 0;text-rendering: \
inherit;margin: 0;line-height: inherit;white-space: inherit;">Question which \
currently prevents me from understanding more: how to know the language's main \
dialect? Is that "ll_LL", so the same letters as used for the country as used for the \
language? (I learned that LANG has to have the country always set in the given code, \
so it is not a problem there)</p> <p style="padding: 0;text-rendering: \
inherit;margin: 0;line-height: inherit;white-space: inherit;">So with "LANGUAGE=ll" \
if there is no "ll/LC_MESSAGES/foo.mo" would we also need to check for \
"ll_LL/LC_MESSAGES/foo.mo"?</p></pre>  </blockquote>





 <p>On јун 26th, 2016, 9:27 по п. CEST, <b>Friedrich W. H. Kossebau</b> \
wrote:</p>  <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; \
padding-left: 10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; \
white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;"><p \
style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: \
inherit;">The code at least of the gnu implementation seems to not make a different \
handling of the values for LANGUAGE over the values for LANG, LC_MESSAGES or LC_ALL. \
Is that a GNU exception of the rule "to be taken as-is"? But than the whole var \
LANGUAGE is a GNU extension from what I read.</p> <p style="padding: \
0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">I \
followed KCatalog::translate -&gt; dgettext -&gt; dcgettext -&gt; dcigettext. There \
<code style="text-rendering: inherit;color: #4444cc;padding: 0;white-space: \
normal;margin: 0;line-height: inherit;">guess_category_value()</code> is called to \
get the value from the env vars  (see \
https://code.woboq.org/gcc/intl/dcigettext.c.html#565). Which adheres to the known \
preferences, choosing LANGUAGE over LANG, LC_MESSAGES or LC_ALL (latter three cared \
for by <code style="text-rendering: inherit;color: #4444cc;padding: 0;white-space: \
normal;margin: 0;line-height: inherit;">__current_locale_name()</code> or <code \
style="text-rendering: inherit;color: #4444cc;padding: 0;white-space: normal;margin: \
0;line-height: inherit;">_nl_locale_name(...)</code>, where at least the latter also \
just takes the values as given, without further processing). The returned value then \
is processed further without ma  king a difference whether it came from LANGUAGES or \
the other vars and passes all of them (split by ":" char) to <code \
style="text-rendering: inherit;color: #4444cc;padding: 0;white-space: normal;margin: \
0;line-height: inherit;">_nl_find_domain()</code> in the same way.</p> <p \
style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: \
inherit;">So both by behaviour and by how I understand the linked implementation, the \
patch as proposed here should be an improvement :) Please tell me what I missed \
otherwise which should stop this patch.</p></pre>  </blockquote>








</blockquote>

<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">At this point, you \
checked everything that there was to check, so I can only conclude that my \
understanding of the semantics of LANGUAGE was lacking. So the patch is just fine.

(As for the &quot;main dialect&quot;, I wouldn&#39;t try to be yet more smart there, \
but leave it as it is. In particular, there are many cases where language code and \
&quot;main&quot; country code for that language are different, so getting ll and \
trying also ll_LL would not work all the time.)</pre> <br />










<p>- Chusslove</p>


<br />
<p>On јун 18th, 2016, 10:36 по п. CEST, Friedrich W. H. Kossebau wrote:</p>








<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="12" style="border: \
1px #888a85 solid; border-radius: 6px; -moz-border-radius: 6px; \
-webkit-border-radius: 6px;">  <tr>
  <td>

<div>Review request for KDE Frameworks and Chusslove Illich.</div>
<div>By Friedrich W. H. Kossebau.</div>


<p style="color: grey;"><i>Updated Јун 18, 2016, 10:36 по п.</i></p>









<div style="margin-top: 1.5em;">
 <b style="color: #575012; font-size: 10pt;">Repository: </b>
ki18n
</div>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
 <table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" \
style="border: 1px solid #b8b5a0">  <tr>
  <td>
   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;"><p style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: inherit;">This patch fixes two things:</p> <p style="padding: \
0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">a) \
<code style="text-rendering: inherit;color: #4444cc;padding: 0;white-space: \
normal;margin: 0;line-height: inherit;">splitLocale(...)</code> had the wrong order \
of splitting off modifier and codeset: the old code was first splitting off anything \
behind "." as charset (which would include the modifier though if present) and only \
then seeing to split off anything behind "@" as modifier. So with both charset and \
modifier present this would fail.</p> <p style="padding: 0;text-rendering: \
inherit;margin: 0;line-height: inherit;white-space: inherit;">b) The locales listed \
in "LANGUAGE" would be, other than those in "LC_ALL", "LC_MESSAGES" and "LANG", only \
added as they are to the list of <code style="text-rendering: inherit;color: \
#4444cc;padding: 0;white-space: normal;margin: 0;line-height: \
inherit;">localeLanguages</code>, without generating their variants. That seems \
unbalanced to me, as it would mean KCatalog not properly detecting mo files e.g. in \
"/usr/share/locale/ru/LC_MESSAGES/" subfolder with "LANGUAGE=ru_RU.UTF-8", "LANG=", \
"LC_ALL=", "LC_MESSAGES=". Or is that on purpose?</p></pre>  </td>
 </tr>
</table>



<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">

 <li>src/klocalizedstring.cpp <span style="color: grey">(fc80135)</span></li>

</ul>

<p><a href="https://git.reviewboard.kde.org/r/128243/diff/" style="margin-left: \
3em;">View Diff</a></p>






  </td>
 </tr>
</table>







  </div>
 </body>
</html>


--===============0305899511823480279==--


[Attachment #3 (text/plain)]

_______________________________________________
Kde-frameworks-devel mailing list
Kde-frameworks-devel@kde.org
https://mail.kde.org/mailman/listinfo/kde-frameworks-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic