[prev in list] [next in list] [prev in thread] [next in thread]
List: python-list
Subject: Re: [Python-Dev] Unicode decode exception
From: Chris Angelico <rosuav () gmail ! com>
Date: 2014-11-30 20:19:11
Message-ID: CAPTjJmqbUm8g0qszy-RamDQrDQgb1O3ievY+86dRJO=9Jr5weA () mail ! gmail ! com
[Download RAW message or body]
On Sun, Nov 30, 2014 at 7:07 PM, balaji marisetti
<balajimarisetti@gmail.com> wrote:
> Hi,
Hi. This list is for the development *of* Python, not development
*with* Python, so I'm sending this reply also to
python-list@python.org where it can be better handled. You'll probably
want to subscribe here:
https://mail.python.org/mailman/listinfo/python-list
or alternatively, point a news reader at comp.lang.python. Let's
continue this conversation on python-list rather than python-dev.
> When I try to iterate through the lines of a
> file("openssl-1.0.1j/crypto/bn/asm/x86_64-gcc.c"), I get a
> UnicodeDecodeError (in python 3.4.0 on Ubuntu 14.04). But there is no
> such error with python 2.7.6. What could be the problem?
The difference between the two Python versions is that 2.7 lets you be
a bit sloppy about Unicode vs bytes, but 3.4 requires that you keep
them properly separate.
> In [39]: with open("openssl-1.0.1j/crypto/bn/asm/x86_64-gcc.c") as f:
> for line in f:
> print (line)
>
> ---------------------------------------------------------------------------
> UnicodeDecodeError Traceback (most recent call last)
> <ipython-input-39-24a3ae32a691> in <module>()
> 1 with open("../openssl-1.0.1j/crypto/bn/asm/x86_64-gcc.c") as f:
> ----> 2 for line in f:
> 3 print (line)
> 4
>
> /usr/lib/python3.4/codecs.py in decode(self, input, final)
> 311 # decode input (taking the buffer into account)
> 312 data = self.buffer + input
> --> 313 (result, consumed) = self._buffer_decode(data,
> self.errors, final)
> 314 # keep undecoded input until the next call
> 315 self.buffer = data[consumed:]
>
>
> --
> :-)balaji
Most likely, the line of input that you just reached has a non-ASCII
character, and the default encoding is ASCII. (Though without the
actual exception message, I can't be sure of that.) The best fix would
be to know what the file's encoding is, and simply add that as a
parameter to your open() call - perhaps this:
with open("filename", encoding="utf-8") as f:
If you use the right encoding, and the file is correctly encoded, you
should have no errors. If you still have errors... welcome to data
problems, life can be hard. :|
ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic