[prev in list] [next in list] [prev in thread] [next in thread] 

List:       nix-dev
Subject:    Re: [Nix-dev] Unicode locale for build environments
From:       Freddy Rietdijk <freddyrietdijk () fridh ! nl>
Date:       2017-06-25 16:04:38
Message-ID: CAOQtOH3J5q4+BSt7WjvdhLTrbVG4fzQkBEM+YqDOD+mNpm3J5A () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Earlier discussion on the issue tracker about glibcLocales and C.UTF-8.
https://github.com/NixOS/nixpkgs/issues/20192

For Python 3.x I'm of the opinion we could add a minimal glibcLocales that
provides en_US.UTF-8 and sets LC_ALL in `buildPythonPackage`. This is only
for build-time, not run-time.

On Sun, Jun 25, 2017 at 5:57 PM, Benno Fünfstück <
benno.fuenfstueck@gmail.com> wrote:

> Hello list,
>
> right now, the stdenv appears to not set any locale. I think this means
> that the locale defaults to C, which specifies ASCII as the character
> encoding. For example, python then defaults to `ASCII` so it will fail if
> any script tries to open a file with non-ascii characters:
>
> $ nix-shell --pure -p python36 --command 'python -c "import locale;
> print(locale.getpreferredencoding())"'
> ANSI_X3.4-1968
>
> Just recently, I've hit a build that failed due to that:
>
> Traceback (most recent call last):
>   File "nix_run_setup.py", line 8, in <module>
>     exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\\r\\n',
> '\\n'), __file__, 'exec'))
>   File "setup.py", line 20, in <module>
>     long_description=open('README.rst').read(),
>   File "/nix/store/i5ixvcy4i6jqzlzy9aajdhf3wliixv
> h1-python3-3.6.1/lib/python3.6/encodings/ascii.py", line 26, in decode
>     return codecs.ascii_decode(input, self.errors)[0]
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 542:
> ordinal not in range(128)
>
> As UTF-8 is the nowadays almost always used (I have yet to see a source
> archive that does not use UTF-8), I propose that we make the stdenv support
> UTF-8 by default. Would this be a feasible approach? (whether to use
> C.UTF-8  or some other UTF-8 locale like en_US.UTF-8 still needs to be
> decided)
>
> Regards,
> Benno
>
> _______________________________________________
> nix-dev mailing list
> nix-dev@lists.science.uu.nl
> https://mailman.science.uu.nl/mailman/listinfo/nix-dev
>
>

[Attachment #5 (text/html)]

<div dir="ltr">Earlier discussion on the issue tracker about glibcLocales and \
C.UTF-8.<div><a href="https://github.com/NixOS/nixpkgs/issues/20192">https://github.com/NixOS/nixpkgs/issues/20192</a><br></div><div><br></div><div>For \
Python 3.x I&#39;m of the opinion we could add a minimal glibcLocales that provides \
en_US.UTF-8 and sets LC_ALL in `buildPythonPackage`. This is only for build-time, not \
run-time.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, \
Jun 25, 2017 at 5:57 PM, Benno Fünfstück <span dir="ltr">&lt;<a \
href="mailto:benno.fuenfstueck@gmail.com" \
target="_blank">benno.fuenfstueck@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr">Hello list,<div><br></div><div>right now, the \
stdenv appears to not set any locale. I think this means that the locale defaults to \
C, which specifies ASCII as the character encoding. For example, python then defaults \
to `ASCII` so it will fail if any script tries to open a file with non-ascii \
characters:</div><div><br></div><div>$ nix-shell --pure -p python36 --command \
&#39;python -c &quot;import locale; \
print(locale.<wbr>getpreferredencoding())&quot;&#39;</div><div>ANSI_X3.4-1968</div><div><br></div><div>Just \
recently, I&#39;ve hit a build that failed due to \
that:</div><div><br></div><div><div>Traceback (most recent call last):</div><div>   \
File &quot;nix_run_setup.py&quot;, line 8, in &lt;module&gt;</div><div>      \
exec(compile(getattr(tokenize, &#39;open&#39;, \
open)(__file__).read().<wbr>replace(&#39;\\r\\n&#39;, &#39;\\n&#39;), __file__, \
&#39;exec&#39;))</div><div>   File &quot;setup.py&quot;, line 20, in \
&lt;module&gt;</div><div>      \
long_description=open(&#39;README.<wbr>rst&#39;).read(),</div><div>   File \
&quot;/nix/store/<wbr>i5ixvcy4i6jqzlzy9aajdhf3wliixv<wbr>h1-python3-3.6.1/lib/python3.<wbr>6/encodings/ascii.py&quot;, \
line 26, in decode</div><div>      return codecs.ascii_decode(input, \
self.errors)[0]</div><div>UnicodeDecodeError: &#39;ascii&#39; codec can&#39;t decode \
byte 0xc3 in position 542: ordinal not in \
range(128)</div></div><div><br></div><div>As UTF-8 is the nowadays almost always used \
(I have yet to see a source archive that does not use UTF-8), I propose that we make \
the stdenv support UTF-8 by default. Would this be a feasible approach? (whether to \
use C.UTF-8   or some other UTF-8 locale like en_US.UTF-8 still needs to be \
decided)</div><div><br></div><div>Regards,</div><div>Benno</div></div> \
<br>______________________________<wbr>_________________<br> nix-dev mailing list<br>
<a href="mailto:nix-dev@lists.science.uu.nl">nix-dev@lists.science.uu.nl</a><br>
<a href="https://mailman.science.uu.nl/mailman/listinfo/nix-dev" rel="noreferrer" \
target="_blank">https://mailman.science.uu.nl/<wbr>mailman/listinfo/nix-dev</a><br> \
<br></blockquote></div><br></div>



_______________________________________________
nix-dev mailing list
nix-dev@lists.science.uu.nl
https://mailman.science.uu.nl/mailman/listinfo/nix-dev


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic