[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kwrite-devel
Subject:    Re: Using i18n
From:       "Philipp A." <flying-sheep () web ! de>
Date:       2013-05-06 19:28:10
Message-ID: CAN8d9gmuogP0LCdT9c=9jRgDwQTCXM4CfkPjOwVExNm_typxiQ () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


since it already accepts strings that contain no unicode-range characters
(i.e. happen to be ASCII, since ASCII is a subset of utf-8), the best and
most compatible fixes would be imho

1. either to simply check if an argument is a string (not bytes), and
encode it using utf-8 if it is, then passing it on to the C++ i18n. like
that python code does (just an example, doing it
here<https://projects.kde.org/projects/kde/kdebindings/pykde4/repository/revisions/master/entry/sip/kdecore/klocalizedstring.sip#L66>is
 more efficient):
   import PyKDE4.kdecore
   def i18n(s):
       if not isinstance(s, bytes):
           s = s.encode('utf-8')
       return PyKDE4.kdecore.i18n(s)
2. another option is to find the deeply embedded part where it gets encoded
using the ascii codec, and replacing that with utf-8.

either way everything that worked before continues to work, and strings
containing unicode-range characters will start working.

on IRC someone suggested that QStrings are per default ASCII, so the 2nd
option might not be worth it.


2013/5/6 Albert Astals Cid <aacid@kde.org>

> El Dissabte, 4 de maig de 2013, a les 21:24:37, Philipp A. va escriure:
> > > i18n only works on utf8 formatted "ascii strings"
> > 
> > no, as i said, this works: i18n(' ·'.encode('utf-8'))
> > 
> > " ·" is unicode, not ascii. when i encode it to utf-8 bytes, and call i18n
> > on it, i18n flawlessly returns a unicode string, which means that it
> > decodes the bytes it's passed as utf-8, not ascii. only if i pass it a
> > unicode string, it inexplicably tries to *encode* it to bytes with the
> > ascii codec. no idea why, but it's true.
> > 
> > so what's the case is that i18n works on either
> > 1. *utf-8-encoded* byte strings of the whole unicode range
> 
> Yeah, that's what i meant with 'utf8 formatted "ascii strings"', wrong
> wording
> on my side.
> 
> That is of course what the C++ i18n function expects, not sure about the
> expectations from the python counterpart.
> 
> Cheers,
> Albert
> 
> > 2.  unicode strings which happen to only contain characters from the
> ascii
> > range.
> > 
> > so it accepts only  objects that survive the following treatment:
> > 
> > def test(t):
> > u = str if sys.version_info.major == 3 else unicode
> > if isinstance(t, u):
> > return t.encode('ascii')
> > else:
> > assert isinstance(t, bytes)
> > return t
> > 
> > 2013/5/4 Albert Astals Cid <aacid@kde.org>
> > 
> > > El Divendres, 3 de maig de 2013, a les 22:17:29, Philipp A. va
> escriure:
> > > > well, it is:
> > > > > > > from sys import version_info
> > > > > > > version_info[:2]
> > > > 
> > > > (3, 3)
> > > > 
> > > > > > > from PyKDE4.kdecore import versionString, i18n
> > > > > > > versionString()
> > > > 
> > > > '4.10.2'
> > > > 
> > > > > > > i18n(' ·'.encode('utf-8'))
> > > > 
> > > > ' ·'
> > > > 
> > > > > > > print(i18n(' ·'))
> > > > 
> > > > Traceback (most recent call last):
> > > > File "<stdin>", line 1, in <module>
> > > > 
> > > > UnicodeEncodeError: 'ascii' codec can't encode character '\xb7' in
> > > 
> > > position
> > > 
> > > > 0: ordinal not in range(128)
> > > > 
> > > > note that the result of i18n *is* a unicode string, and i18n
> *accepts*
> > > > unicode strings, but only if those unicode strings happen to only
> > > > contain
> > > > ascii – just like in the bad old python2 times.
> > > > 
> > > > so i18n is buggy on KDE 4.10, and we have to work around it.
> > > 
> > > Why is it buggy? i18n only works on utf8 formatted "ascii strings"
> > > 
> > > Are you expecting something in the python part to do some magic?
> > > 
> > > Cheers,
> > > 
> > > Albert
> > > 
> > > > 2013/5/3 Shaheed Haque <srhaque@theiet.org>
> > > > 
> > > > > Just after I hit "send", I found this:
> > > > > 
> > > > > 
> http://www.mail-archive.com/pyqt@riverbankcomputing.com/msg14058.html
> > > > > 
> > > > > which suggests this is not an issue???
> > > > > 
> > > > > On 3 May 2013 20:42, Shaheed Haque <srhaque@theiet.org> wrote:
> > > > > > Hi Philipp,
> > > > > > 
> > > > > > On 3 May 2013 19:56, Philipp A. <flying-sheep@web.de> wrote:
> > > > > > > Hi, i've seen some uses of kdecore.i18n popping up in Paté
> plugins,
> > > 
> > > and
> > > 
> > > > > > > have some recommendations:
> > > > > > > 
> > > > > > > 2. It takes more than one argument. so for the sake of
> consistency
> > > > > > > instead of doing the ugly
> > > > > > > 
> > > > > > > i18n(b'foo %(name)s.') % { 'name': 'bar'}
> > > > > > > 
> > > > > > > or even the better
> > > > > > > 
> > > > > > > i18n(b'foo {name}.').format(name='bar')
> > > > > > > 
> > > > > > > we should do the Qt-style
> > > > > > > 
> > > > > > > i18n(b'foo %1.', 'bar')
> > > > > > > 
> > > > > > > 1.  i18n takes byte strings. even on python3. this means that
> every
> > > 
> > > time
> > > 
> > > > > > > a developer accustomed to python2 who doesn't know it tries to
> use
> > > 
> > > it,
> > > 
> > > > > > > the
> > > > > > > plugin WILL break for python3 users.
> > > > > > 
> > > > > > I've been using the argument syntax of the third form, but simply
> > > > > > specified quoted strings (i.e. without the "b" prefix). Without
> > > > > > really
> > > > > > thinking about it, I had assumed that i18n would have done
> something
> > > > > > plausible on Python2 (not sure exactly what though!), and on
> Python3
> > > 
> > > it
> > > 
> > > > > > would just be Unicode all the way. I'd certainly prefer not avoid
> > > 
> > > having
> > > 
> > > > > > to
> > > > > > use "b" all over the place.
> > > 
> > > 
> https://github.com/Werkov/PyQt4/blob/master/examples/tools/i18n/i18n.py
> > > 
> > > > > > seems to suggest that something like that is possible, but when I
> > > > > > went
> > > > > > looking for some docs on this, but could not see an obvious spec.
> Do
> > > 
> > > you
> > > 
> > > > > > have a reference handy?
> > > > > > 
> > > > > > Thanks, Shaheed
> > > > > > 
> > > > > > > we have to come up with a solution.
> > > > > > > 
> > > > > > > there is a possible solution here, but it involves a fairly
> > > 
> > > convoluted
> > > 
> > > > > > > i18n replacement:
> > > 
> https://projects.kde.org/projects/kde/applications/kate/repository/revis
> > > 
> > > 
> ions/master/entry/addons/kate/pate/src/plugins/python_console_ipython/py
> > > 
> > > > > > > thon_console_ipython.py#L36
> > > > > > > 
> > > > > > > should we add that function to libkatepate and call it a day?
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > KWrite-Devel mailing list
> > > > > > > KWrite-Devel@kde.org
> > > > > > > https://mail.kde.org/mailman/listinfo/kwrite-devel
> > > > > 
> > > > > _______________________________________________
> > > > > KWrite-Devel mailing list
> > > > > KWrite-Devel@kde.org
> > > > > https://mail.kde.org/mailman/listinfo/kwrite-devel
> > > 
> > > _______________________________________________
> > > KWrite-Devel mailing list
> > > KWrite-Devel@kde.org
> > > https://mail.kde.org/mailman/listinfo/kwrite-devel
> _______________________________________________
> KWrite-Devel mailing list
> KWrite-Devel@kde.org
> https://mail.kde.org/mailman/listinfo/kwrite-devel
> 


[Attachment #5 (text/html)]

<div dir="ltr"><div>since it already accepts strings that contain no unicode-range \
characters (i.e. happen to be ASCII, since ASCII is a subset of utf-8), the best and \
most compatible fixes would be imho<br><br>1. either to simply check if an argument \
is a string (not bytes), and encode it using utf-8 if it is, then passing it on to \
the C++ i18n. like that python code does (just an example, doing it <a \
href="https://projects.kde.org/projects/kde/kdebindings/pykde4/repository/revisions/master/entry/sip/kdecore/klocalizedstring.sip#L66">here</a> \
is more efficient):<br> </div><div>     import PyKDE4.kdecore</div><div>     def \
i18n(s):<br>             if not isinstance(s, bytes):<br></div><div>                  \
s = s.encode(&#39;utf-8&#39;)</div><div>             return \
PyKDE4.kdecore.i18n(s)<br></div><div>2. another option is to find the deeply embedded \
part where it gets encoded using the ascii codec, and replacing that with utf-8.<br> \
<br>either way everything that worked before continues to work, and strings \
containing unicode-range characters will start working.<br><br></div><div>on IRC \
someone suggested that QStrings are per default ASCII, so the 2nd option might not be \
worth it.<br> </div></div><div class="gmail_extra"><br><br><div \
class="gmail_quote">2013/5/6 Albert Astals Cid <span dir="ltr">&lt;<a \
href="mailto:aacid@kde.org" \
target="_blank">aacid@kde.org</a>&gt;</span><br><blockquote class="gmail_quote" \
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> El Dissabte, 4 \
de maig de 2013, a les 21:24:37, Philipp A. va escriure:<br> <div class="im">&gt; \
&gt; i18n only works on utf8 formatted &quot;ascii strings&quot;<br> &gt;<br>
&gt; no, as i said, this works: i18n(&#39; ·&#39;.encode(&#39;utf-8&#39;))<br>
&gt;<br>
&gt; " ·" is unicode, not ascii. when i encode it to utf-8 bytes, and call i18n<br>
&gt; on it, i18n flawlessly returns a unicode string, which means that it<br>
&gt; decodes the bytes it's passed as utf-8, not ascii. only if i pass it a<br>
&gt; unicode string, it inexplicably tries to *encode* it to bytes with the<br>
&gt; ascii codec. no idea why, but it's true.<br>
&gt;<br>
&gt; so what's the case is that i18n works on either<br>
&gt; 1. *utf-8-encoded* byte strings of the whole unicode range<br>
<br>
</div>Yeah, that&#39;s what i meant with &#39;utf8 formatted &quot;ascii \
strings&quot;&#39;, wrong wording<br> on my side.<br>
<br>
That is of course what the C++ i18n function expects, not sure about the<br>
expectations from the python counterpart.<br>
<br>
Cheers,<br>
   Albert<br>
<div class="HOEnZb"><div class="h5"><br>
&gt; 2.   unicode strings which happen to only contain characters from the ascii<br>
&gt; range.<br>
&gt;<br>
&gt; so it accepts only   objects that survive the following treatment:<br>
&gt;<br>
&gt; def test(t):<br>
&gt;       u = str if sys.version_info.major == 3 else unicode<br>
&gt;       if isinstance(t, u):<br>
&gt;             return t.encode(&#39;ascii&#39;)<br>
&gt;       else:<br>
&gt;             assert isinstance(t, bytes)<br>
&gt;             return t<br>
&gt;<br>
&gt; 2013/5/4 Albert Astals Cid &lt;<a \
href="mailto:aacid@kde.org">aacid@kde.org</a>&gt;<br> &gt;<br>
&gt; &gt; El Divendres, 3 de maig de 2013, a les 22:17:29, Philipp A. va \
escriure:<br> &gt; &gt; &gt; well, it is:<br>
&gt; &gt; &gt; &gt;&gt;&gt; from sys import version_info<br>
&gt; &gt; &gt; &gt;&gt;&gt; version_info[:2]<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; (3, 3)<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt;&gt;&gt; from PyKDE4.kdecore import versionString, i18n<br>
&gt; &gt; &gt; &gt;&gt;&gt; versionString()<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; &#39;4.10.2&#39;<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt;&gt;&gt; i18n(&#39; ·&#39;.encode(&#39;utf-8&#39;))<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; &#39; ·&#39;<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt;&gt;&gt; print(i18n(&#39; ·&#39;))<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; Traceback (most recent call last):<br>
&gt; &gt; &gt;    File &quot;&lt;stdin&gt;&quot;, line 1, in &lt;module&gt;<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; UnicodeEncodeError: &#39;ascii&#39; codec can&#39;t encode character \
&#39;\xb7&#39; in<br> &gt; &gt;<br>
&gt; &gt; position<br>
&gt; &gt;<br>
&gt; &gt; &gt; 0: ordinal not in range(128)<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; note that the result of i18n *is* a unicode string, and i18n \
*accepts*<br> &gt; &gt; &gt; unicode strings, but only if those unicode strings \
happen to only<br> &gt; &gt; &gt; contain<br>
&gt; &gt; &gt; ascii – just like in the bad old python2 times.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; so i18n is buggy on KDE 4.10, and we have to work around it.<br>
&gt; &gt;<br>
&gt; &gt; Why is it buggy? i18n only works on utf8 formatted &quot;ascii \
strings&quot;<br> &gt; &gt;<br>
&gt; &gt; Are you expecting something in the python part to do some magic?<br>
&gt; &gt;<br>
&gt; &gt; Cheers,<br>
&gt; &gt;<br>
&gt; &gt;    Albert<br>
&gt; &gt;<br>
&gt; &gt; &gt; 2013/5/3 Shaheed Haque &lt;<a \
href="mailto:srhaque@theiet.org">srhaque@theiet.org</a>&gt;<br> &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; Just after I hit &quot;send&quot;, I found this:<br>
&gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; <a \
href="http://www.mail-archive.com/pyqt@riverbankcomputing.com/msg14058.html" \
target="_blank">http://www.mail-archive.com/pyqt@riverbankcomputing.com/msg14058.html</a><br>
 &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; which suggests this is not an issue???<br>
&gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; On 3 May 2013 20:42, Shaheed Haque &lt;<a \
href="mailto:srhaque@theiet.org">srhaque@theiet.org</a>&gt; wrote:<br> &gt; &gt; &gt; \
&gt;&gt; Hi Philipp,<br> &gt; &gt; &gt; &gt;&gt;<br>
&gt; &gt; &gt; &gt;&gt; On 3 May 2013 19:56, Philipp A. &lt;<a \
href="mailto:flying-sheep@web.de">flying-sheep@web.de</a>&gt; wrote:<br> &gt; &gt; \
&gt; &gt;&gt;&gt; Hi, i've seen some uses of kdecore.i18n popping up in Paté \
plugins,<br> &gt; &gt;<br>
&gt; &gt; and<br>
&gt; &gt;<br>
&gt; &gt; &gt; &gt;&gt;&gt; have some recommendations:<br>
&gt; &gt; &gt; &gt;&gt;&gt;<br>
&gt; &gt; &gt; &gt;&gt;&gt; 2. It takes more than one argument. so for the sake of \
consistency<br> &gt; &gt; &gt; &gt;&gt;&gt; instead of doing the ugly<br>
&gt; &gt; &gt; &gt;&gt;&gt;<br>
&gt; &gt; &gt; &gt;&gt;&gt;       i18n(b&#39;foo %(name)s.&#39;) % { &#39;name&#39;: \
&#39;bar&#39;}<br> &gt; &gt; &gt; &gt;&gt;&gt;<br>
&gt; &gt; &gt; &gt;&gt;&gt; or even the better<br>
&gt; &gt; &gt; &gt;&gt;&gt;<br>
&gt; &gt; &gt; &gt;&gt;&gt;       i18n(b&#39;foo \
{name}.&#39;).format(name=&#39;bar&#39;)<br> &gt; &gt; &gt; &gt;&gt;&gt;<br>
&gt; &gt; &gt; &gt;&gt;&gt; we should do the Qt-style<br>
&gt; &gt; &gt; &gt;&gt;&gt;<br>
&gt; &gt; &gt; &gt;&gt;&gt;       i18n(b&#39;foo %1.&#39;, &#39;bar&#39;)<br>
&gt; &gt; &gt; &gt;&gt;&gt;<br>
&gt; &gt; &gt; &gt;&gt;&gt; 1.   i18n takes byte strings. even on python3. this means \
that every<br> &gt; &gt;<br>
&gt; &gt; time<br>
&gt; &gt;<br>
&gt; &gt; &gt; &gt;&gt;&gt; a developer accustomed to python2 who doesn't know it \
tries to use<br> &gt; &gt;<br>
&gt; &gt; it,<br>
&gt; &gt;<br>
&gt; &gt; &gt; &gt;&gt;&gt; the<br>
&gt; &gt; &gt; &gt;&gt;&gt; plugin WILL break for python3 users.<br>
&gt; &gt; &gt; &gt;&gt;<br>
&gt; &gt; &gt; &gt;&gt; I&#39;ve been using the argument syntax of the third form, \
but simply<br> &gt; &gt; &gt; &gt;&gt; specified quoted strings (i.e. without the \
&quot;b&quot; prefix). Without<br> &gt; &gt; &gt; &gt;&gt; really<br>
&gt; &gt; &gt; &gt;&gt; thinking about it, I had assumed that i18n would have done \
something<br> &gt; &gt; &gt; &gt;&gt; plausible on Python2 (not sure exactly what \
though!), and on Python3<br> &gt; &gt;<br>
&gt; &gt; it<br>
&gt; &gt;<br>
&gt; &gt; &gt; &gt;&gt; would just be Unicode all the way. I&#39;d certainly prefer \
not avoid<br> &gt; &gt;<br>
&gt; &gt; having<br>
&gt; &gt;<br>
&gt; &gt; &gt; &gt;&gt; to<br>
&gt; &gt; &gt; &gt;&gt; use &quot;b&quot; all over the place.<br>
&gt; &gt;<br>
&gt; &gt; <a href="https://github.com/Werkov/PyQt4/blob/master/examples/tools/i18n/i18n.py" \
target="_blank">https://github.com/Werkov/PyQt4/blob/master/examples/tools/i18n/i18n.py</a><br>
 &gt; &gt;<br>
&gt; &gt; &gt; &gt;&gt; seems to suggest that something like that is possible, but \
when I<br> &gt; &gt; &gt; &gt;&gt; went<br>
&gt; &gt; &gt; &gt;&gt; looking for some docs on this, but could not see an obvious \
spec. Do<br> &gt; &gt;<br>
&gt; &gt; you<br>
&gt; &gt;<br>
&gt; &gt; &gt; &gt;&gt; have a reference handy?<br>
&gt; &gt; &gt; &gt;&gt;<br>
&gt; &gt; &gt; &gt;&gt; Thanks, Shaheed<br>
&gt; &gt; &gt; &gt;&gt;<br>
&gt; &gt; &gt; &gt;&gt;&gt; we have to come up with a solution.<br>
&gt; &gt; &gt; &gt;&gt;&gt;<br>
&gt; &gt; &gt; &gt;&gt;&gt; there is a possible solution here, but it involves a \
fairly<br> &gt; &gt;<br>
&gt; &gt; convoluted<br>
&gt; &gt;<br>
&gt; &gt; &gt; &gt;&gt;&gt; i18n replacement:<br>
&gt; &gt; <a href="https://projects.kde.org/projects/kde/applications/kate/repository/revis" \
target="_blank">https://projects.kde.org/projects/kde/applications/kate/repository/revis</a><br>
 &gt; &gt;<br>
&gt; &gt; ions/master/entry/addons/kate/pate/src/plugins/python_console_ipython/py<br>
 &gt; &gt;<br>
&gt; &gt; &gt; &gt;&gt;&gt; thon_console_ipython.py#L36<br>
&gt; &gt; &gt; &gt;&gt;&gt;<br>
&gt; &gt; &gt; &gt;&gt;&gt; should we add that function to libkatepate and call it a \
day?<br> &gt; &gt; &gt; &gt;&gt;&gt;<br>
&gt; &gt; &gt; &gt;&gt;&gt; _______________________________________________<br>
&gt; &gt; &gt; &gt;&gt;&gt; KWrite-Devel mailing list<br>
&gt; &gt; &gt; &gt;&gt;&gt; <a \
href="mailto:KWrite-Devel@kde.org">KWrite-Devel@kde.org</a><br> &gt; &gt; &gt; \
&gt;&gt;&gt; <a href="https://mail.kde.org/mailman/listinfo/kwrite-devel" \
target="_blank">https://mail.kde.org/mailman/listinfo/kwrite-devel</a><br> &gt; &gt; \
&gt; &gt;<br> &gt; &gt; &gt; &gt; _______________________________________________<br>
&gt; &gt; &gt; &gt; KWrite-Devel mailing list<br>
&gt; &gt; &gt; &gt; <a \
href="mailto:KWrite-Devel@kde.org">KWrite-Devel@kde.org</a><br> &gt; &gt; &gt; &gt; \
<a href="https://mail.kde.org/mailman/listinfo/kwrite-devel" \
target="_blank">https://mail.kde.org/mailman/listinfo/kwrite-devel</a><br> &gt; \
&gt;<br> &gt; &gt; _______________________________________________<br>
&gt; &gt; KWrite-Devel mailing list<br>
&gt; &gt; <a href="mailto:KWrite-Devel@kde.org">KWrite-Devel@kde.org</a><br>
&gt; &gt; <a href="https://mail.kde.org/mailman/listinfo/kwrite-devel" \
target="_blank">https://mail.kde.org/mailman/listinfo/kwrite-devel</a><br> \
_______________________________________________<br> KWrite-Devel mailing list<br>
<a href="mailto:KWrite-Devel@kde.org">KWrite-Devel@kde.org</a><br>
<a href="https://mail.kde.org/mailman/listinfo/kwrite-devel" \
target="_blank">https://mail.kde.org/mailman/listinfo/kwrite-devel</a><br> \
</div></div></blockquote></div><br></div>



_______________________________________________
KWrite-Devel mailing list
KWrite-Devel@kde.org
https://mail.kde.org/mailman/listinfo/kwrite-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic