[prev in list] [next in list] [prev in thread] [next in thread]
List: kwrite-devel
Subject: Re: Using i18n
From: "Philipp A." <flying-sheep () web ! de>
Date: 2013-05-06 19:28:10
Message-ID: CAN8d9gmuogP0LCdT9c=9jRgDwQTCXM4CfkPjOwVExNm_typxiQ () mail ! gmail ! com
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
since it already accepts strings that contain no unicode-range characters
(i.e. happen to be ASCII, since ASCII is a subset of utf-8), the best and
most compatible fixes would be imho
1. either to simply check if an argument is a string (not bytes), and
encode it using utf-8 if it is, then passing it on to the C++ i18n. like
that python code does (just an example, doing it
here<https://projects.kde.org/projects/kde/kdebindings/pykde4/repository/revisions/master/entry/sip/kdecore/klocalizedstring.sip#L66>is
more efficient):
import PyKDE4.kdecore
def i18n(s):
if not isinstance(s, bytes):
s = s.encode('utf-8')
return PyKDE4.kdecore.i18n(s)
2. another option is to find the deeply embedded part where it gets encoded
using the ascii codec, and replacing that with utf-8.
either way everything that worked before continues to work, and strings
containing unicode-range characters will start working.
on IRC someone suggested that QStrings are per default ASCII, so the 2nd
option might not be worth it.
2013/5/6 Albert Astals Cid <aacid@kde.org>
> El Dissabte, 4 de maig de 2013, a les 21:24:37, Philipp A. va escriure:
> > > i18n only works on utf8 formatted "ascii strings"
> >
> > no, as i said, this works: i18n(' ·'.encode('utf-8'))
> >
> > " ·" is unicode, not ascii. when i encode it to utf-8 bytes, and call i18n
> > on it, i18n flawlessly returns a unicode string, which means that it
> > decodes the bytes it's passed as utf-8, not ascii. only if i pass it a
> > unicode string, it inexplicably tries to *encode* it to bytes with the
> > ascii codec. no idea why, but it's true.
> >
> > so what's the case is that i18n works on either
> > 1. *utf-8-encoded* byte strings of the whole unicode range
>
> Yeah, that's what i meant with 'utf8 formatted "ascii strings"', wrong
> wording
> on my side.
>
> That is of course what the C++ i18n function expects, not sure about the
> expectations from the python counterpart.
>
> Cheers,
> Albert
>
> > 2. unicode strings which happen to only contain characters from the
> ascii
> > range.
> >
> > so it accepts only objects that survive the following treatment:
> >
> > def test(t):
> > u = str if sys.version_info.major == 3 else unicode
> > if isinstance(t, u):
> > return t.encode('ascii')
> > else:
> > assert isinstance(t, bytes)
> > return t
> >
> > 2013/5/4 Albert Astals Cid <aacid@kde.org>
> >
> > > El Divendres, 3 de maig de 2013, a les 22:17:29, Philipp A. va
> escriure:
> > > > well, it is:
> > > > > > > from sys import version_info
> > > > > > > version_info[:2]
> > > >
> > > > (3, 3)
> > > >
> > > > > > > from PyKDE4.kdecore import versionString, i18n
> > > > > > > versionString()
> > > >
> > > > '4.10.2'
> > > >
> > > > > > > i18n(' ·'.encode('utf-8'))
> > > >
> > > > ' ·'
> > > >
> > > > > > > print(i18n(' ·'))
> > > >
> > > > Traceback (most recent call last):
> > > > File "<stdin>", line 1, in <module>
> > > >
> > > > UnicodeEncodeError: 'ascii' codec can't encode character '\xb7' in
> > >
> > > position
> > >
> > > > 0: ordinal not in range(128)
> > > >
> > > > note that the result of i18n *is* a unicode string, and i18n
> *accepts*
> > > > unicode strings, but only if those unicode strings happen to only
> > > > contain
> > > > ascii – just like in the bad old python2 times.
> > > >
> > > > so i18n is buggy on KDE 4.10, and we have to work around it.
> > >
> > > Why is it buggy? i18n only works on utf8 formatted "ascii strings"
> > >
> > > Are you expecting something in the python part to do some magic?
> > >
> > > Cheers,
> > >
> > > Albert
> > >
> > > > 2013/5/3 Shaheed Haque <srhaque@theiet.org>
> > > >
> > > > > Just after I hit "send", I found this:
> > > > >
> > > > >
> http://www.mail-archive.com/pyqt@riverbankcomputing.com/msg14058.html
> > > > >
> > > > > which suggests this is not an issue???
> > > > >
> > > > > On 3 May 2013 20:42, Shaheed Haque <srhaque@theiet.org> wrote:
> > > > > > Hi Philipp,
> > > > > >
> > > > > > On 3 May 2013 19:56, Philipp A. <flying-sheep@web.de> wrote:
> > > > > > > Hi, i've seen some uses of kdecore.i18n popping up in Paté
> plugins,
> > >
> > > and
> > >
> > > > > > > have some recommendations:
> > > > > > >
> > > > > > > 2. It takes more than one argument. so for the sake of
> consistency
> > > > > > > instead of doing the ugly
> > > > > > >
> > > > > > > i18n(b'foo %(name)s.') % { 'name': 'bar'}
> > > > > > >
> > > > > > > or even the better
> > > > > > >
> > > > > > > i18n(b'foo {name}.').format(name='bar')
> > > > > > >
> > > > > > > we should do the Qt-style
> > > > > > >
> > > > > > > i18n(b'foo %1.', 'bar')
> > > > > > >
> > > > > > > 1. i18n takes byte strings. even on python3. this means that
> every
> > >
> > > time
> > >
> > > > > > > a developer accustomed to python2 who doesn't know it tries to
> use
> > >
> > > it,
> > >
> > > > > > > the
> > > > > > > plugin WILL break for python3 users.
> > > > > >
> > > > > > I've been using the argument syntax of the third form, but simply
> > > > > > specified quoted strings (i.e. without the "b" prefix). Without
> > > > > > really
> > > > > > thinking about it, I had assumed that i18n would have done
> something
> > > > > > plausible on Python2 (not sure exactly what though!), and on
> Python3
> > >
> > > it
> > >
> > > > > > would just be Unicode all the way. I'd certainly prefer not avoid
> > >
> > > having
> > >
> > > > > > to
> > > > > > use "b" all over the place.
> > >
> > >
> https://github.com/Werkov/PyQt4/blob/master/examples/tools/i18n/i18n.py
> > >
> > > > > > seems to suggest that something like that is possible, but when I
> > > > > > went
> > > > > > looking for some docs on this, but could not see an obvious spec.
> Do
> > >
> > > you
> > >
> > > > > > have a reference handy?
> > > > > >
> > > > > > Thanks, Shaheed
> > > > > >
> > > > > > > we have to come up with a solution.
> > > > > > >
> > > > > > > there is a possible solution here, but it involves a fairly
> > >
> > > convoluted
> > >
> > > > > > > i18n replacement:
> > >
> https://projects.kde.org/projects/kde/applications/kate/repository/revis
> > >
> > >
> ions/master/entry/addons/kate/pate/src/plugins/python_console_ipython/py
> > >
> > > > > > > thon_console_ipython.py#L36
> > > > > > >
> > > > > > > should we add that function to libkatepate and call it a day?
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > KWrite-Devel mailing list
> > > > > > > KWrite-Devel@kde.org
> > > > > > > https://mail.kde.org/mailman/listinfo/kwrite-devel
> > > > >
> > > > > _______________________________________________
> > > > > KWrite-Devel mailing list
> > > > > KWrite-Devel@kde.org
> > > > > https://mail.kde.org/mailman/listinfo/kwrite-devel
> > >
> > > _______________________________________________
> > > KWrite-Devel mailing list
> > > KWrite-Devel@kde.org
> > > https://mail.kde.org/mailman/listinfo/kwrite-devel
> _______________________________________________
> KWrite-Devel mailing list
> KWrite-Devel@kde.org
> https://mail.kde.org/mailman/listinfo/kwrite-devel
>
[Attachment #5 (text/html)]
<div dir="ltr"><div>since it already accepts strings that contain no unicode-range \
characters (i.e. happen to be ASCII, since ASCII is a subset of utf-8), the best and \
most compatible fixes would be imho<br><br>1. either to simply check if an argument \
is a string (not bytes), and encode it using utf-8 if it is, then passing it on to \
the C++ i18n. like that python code does (just an example, doing it <a \
href="https://projects.kde.org/projects/kde/kdebindings/pykde4/repository/revisions/master/entry/sip/kdecore/klocalizedstring.sip#L66">here</a> \
is more efficient):<br> </div><div> import PyKDE4.kdecore</div><div> def \
i18n(s):<br> if not isinstance(s, bytes):<br></div><div> \
s = s.encode('utf-8')</div><div> return \
PyKDE4.kdecore.i18n(s)<br></div><div>2. another option is to find the deeply embedded \
part where it gets encoded using the ascii codec, and replacing that with utf-8.<br> \
<br>either way everything that worked before continues to work, and strings \
containing unicode-range characters will start working.<br><br></div><div>on IRC \
someone suggested that QStrings are per default ASCII, so the 2nd option might not be \
worth it.<br> </div></div><div class="gmail_extra"><br><br><div \
class="gmail_quote">2013/5/6 Albert Astals Cid <span dir="ltr"><<a \
href="mailto:aacid@kde.org" \
target="_blank">aacid@kde.org</a>></span><br><blockquote class="gmail_quote" \
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> El Dissabte, 4 \
de maig de 2013, a les 21:24:37, Philipp A. va escriure:<br> <div class="im">> \
> i18n only works on utf8 formatted "ascii strings"<br> ><br>
> no, as i said, this works: i18n(' ·'.encode('utf-8'))<br>
><br>
> " ·" is unicode, not ascii. when i encode it to utf-8 bytes, and call i18n<br>
> on it, i18n flawlessly returns a unicode string, which means that it<br>
> decodes the bytes it's passed as utf-8, not ascii. only if i pass it a<br>
> unicode string, it inexplicably tries to *encode* it to bytes with the<br>
> ascii codec. no idea why, but it's true.<br>
><br>
> so what's the case is that i18n works on either<br>
> 1. *utf-8-encoded* byte strings of the whole unicode range<br>
<br>
</div>Yeah, that's what i meant with 'utf8 formatted "ascii \
strings"', wrong wording<br> on my side.<br>
<br>
That is of course what the C++ i18n function expects, not sure about the<br>
expectations from the python counterpart.<br>
<br>
Cheers,<br>
Albert<br>
<div class="HOEnZb"><div class="h5"><br>
> 2. unicode strings which happen to only contain characters from the ascii<br>
> range.<br>
><br>
> so it accepts only objects that survive the following treatment:<br>
><br>
> def test(t):<br>
> u = str if sys.version_info.major == 3 else unicode<br>
> if isinstance(t, u):<br>
> return t.encode('ascii')<br>
> else:<br>
> assert isinstance(t, bytes)<br>
> return t<br>
><br>
> 2013/5/4 Albert Astals Cid <<a \
href="mailto:aacid@kde.org">aacid@kde.org</a>><br> ><br>
> > El Divendres, 3 de maig de 2013, a les 22:17:29, Philipp A. va \
escriure:<br> > > > well, it is:<br>
> > > >>> from sys import version_info<br>
> > > >>> version_info[:2]<br>
> > ><br>
> > > (3, 3)<br>
> > ><br>
> > > >>> from PyKDE4.kdecore import versionString, i18n<br>
> > > >>> versionString()<br>
> > ><br>
> > > '4.10.2'<br>
> > ><br>
> > > >>> i18n(' ·'.encode('utf-8'))<br>
> > ><br>
> > > ' ·'<br>
> > ><br>
> > > >>> print(i18n(' ·'))<br>
> > ><br>
> > > Traceback (most recent call last):<br>
> > > File "<stdin>", line 1, in <module><br>
> > ><br>
> > > UnicodeEncodeError: 'ascii' codec can't encode character \
'\xb7' in<br> > ><br>
> > position<br>
> ><br>
> > > 0: ordinal not in range(128)<br>
> > ><br>
> > > note that the result of i18n *is* a unicode string, and i18n \
*accepts*<br> > > > unicode strings, but only if those unicode strings \
happen to only<br> > > > contain<br>
> > > ascii – just like in the bad old python2 times.<br>
> > ><br>
> > > so i18n is buggy on KDE 4.10, and we have to work around it.<br>
> ><br>
> > Why is it buggy? i18n only works on utf8 formatted "ascii \
strings"<br> > ><br>
> > Are you expecting something in the python part to do some magic?<br>
> ><br>
> > Cheers,<br>
> ><br>
> > Albert<br>
> ><br>
> > > 2013/5/3 Shaheed Haque <<a \
href="mailto:srhaque@theiet.org">srhaque@theiet.org</a>><br> > > ><br>
> > > > Just after I hit "send", I found this:<br>
> > > ><br>
> > > > <a \
href="http://www.mail-archive.com/pyqt@riverbankcomputing.com/msg14058.html" \
target="_blank">http://www.mail-archive.com/pyqt@riverbankcomputing.com/msg14058.html</a><br>
> > > ><br>
> > > > which suggests this is not an issue???<br>
> > > ><br>
> > > > On 3 May 2013 20:42, Shaheed Haque <<a \
href="mailto:srhaque@theiet.org">srhaque@theiet.org</a>> wrote:<br> > > > \
>> Hi Philipp,<br> > > > >><br>
> > > >> On 3 May 2013 19:56, Philipp A. <<a \
href="mailto:flying-sheep@web.de">flying-sheep@web.de</a>> wrote:<br> > > \
> >>> Hi, i've seen some uses of kdecore.i18n popping up in Paté \
plugins,<br> > ><br>
> > and<br>
> ><br>
> > > >>> have some recommendations:<br>
> > > >>><br>
> > > >>> 2. It takes more than one argument. so for the sake of \
consistency<br> > > > >>> instead of doing the ugly<br>
> > > >>><br>
> > > >>> i18n(b'foo %(name)s.') % { 'name': \
'bar'}<br> > > > >>><br>
> > > >>> or even the better<br>
> > > >>><br>
> > > >>> i18n(b'foo \
{name}.').format(name='bar')<br> > > > >>><br>
> > > >>> we should do the Qt-style<br>
> > > >>><br>
> > > >>> i18n(b'foo %1.', 'bar')<br>
> > > >>><br>
> > > >>> 1. i18n takes byte strings. even on python3. this means \
that every<br> > ><br>
> > time<br>
> ><br>
> > > >>> a developer accustomed to python2 who doesn't know it \
tries to use<br> > ><br>
> > it,<br>
> ><br>
> > > >>> the<br>
> > > >>> plugin WILL break for python3 users.<br>
> > > >><br>
> > > >> I've been using the argument syntax of the third form, \
but simply<br> > > > >> specified quoted strings (i.e. without the \
"b" prefix). Without<br> > > > >> really<br>
> > > >> thinking about it, I had assumed that i18n would have done \
something<br> > > > >> plausible on Python2 (not sure exactly what \
though!), and on Python3<br> > ><br>
> > it<br>
> ><br>
> > > >> would just be Unicode all the way. I'd certainly prefer \
not avoid<br> > ><br>
> > having<br>
> ><br>
> > > >> to<br>
> > > >> use "b" all over the place.<br>
> ><br>
> > <a href="https://github.com/Werkov/PyQt4/blob/master/examples/tools/i18n/i18n.py" \
target="_blank">https://github.com/Werkov/PyQt4/blob/master/examples/tools/i18n/i18n.py</a><br>
> ><br>
> > > >> seems to suggest that something like that is possible, but \
when I<br> > > > >> went<br>
> > > >> looking for some docs on this, but could not see an obvious \
spec. Do<br> > ><br>
> > you<br>
> ><br>
> > > >> have a reference handy?<br>
> > > >><br>
> > > >> Thanks, Shaheed<br>
> > > >><br>
> > > >>> we have to come up with a solution.<br>
> > > >>><br>
> > > >>> there is a possible solution here, but it involves a \
fairly<br> > ><br>
> > convoluted<br>
> ><br>
> > > >>> i18n replacement:<br>
> > <a href="https://projects.kde.org/projects/kde/applications/kate/repository/revis" \
target="_blank">https://projects.kde.org/projects/kde/applications/kate/repository/revis</a><br>
> ><br>
> > ions/master/entry/addons/kate/pate/src/plugins/python_console_ipython/py<br>
> ><br>
> > > >>> thon_console_ipython.py#L36<br>
> > > >>><br>
> > > >>> should we add that function to libkatepate and call it a \
day?<br> > > > >>><br>
> > > >>> _______________________________________________<br>
> > > >>> KWrite-Devel mailing list<br>
> > > >>> <a \
href="mailto:KWrite-Devel@kde.org">KWrite-Devel@kde.org</a><br> > > > \
>>> <a href="https://mail.kde.org/mailman/listinfo/kwrite-devel" \
target="_blank">https://mail.kde.org/mailman/listinfo/kwrite-devel</a><br> > > \
> ><br> > > > > _______________________________________________<br>
> > > > KWrite-Devel mailing list<br>
> > > > <a \
href="mailto:KWrite-Devel@kde.org">KWrite-Devel@kde.org</a><br> > > > > \
<a href="https://mail.kde.org/mailman/listinfo/kwrite-devel" \
target="_blank">https://mail.kde.org/mailman/listinfo/kwrite-devel</a><br> > \
><br> > > _______________________________________________<br>
> > KWrite-Devel mailing list<br>
> > <a href="mailto:KWrite-Devel@kde.org">KWrite-Devel@kde.org</a><br>
> > <a href="https://mail.kde.org/mailman/listinfo/kwrite-devel" \
target="_blank">https://mail.kde.org/mailman/listinfo/kwrite-devel</a><br> \
_______________________________________________<br> KWrite-Devel mailing list<br>
<a href="mailto:KWrite-Devel@kde.org">KWrite-Devel@kde.org</a><br>
<a href="https://mail.kde.org/mailman/listinfo/kwrite-devel" \
target="_blank">https://mail.kde.org/mailman/listinfo/kwrite-devel</a><br> \
</div></div></blockquote></div><br></div>
_______________________________________________
KWrite-Devel mailing list
KWrite-Devel@kde.org
https://mail.kde.org/mailman/listinfo/kwrite-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic