'Problem reading file with umlauts'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       python-list
Subject:    Problem reading file with umlauts
From:       "Claus Hausberger" <CHausberger () gmx ! de>
Date:       2009-07-07 13:59:49
Message-ID: 20090707135949.100950 () gmx ! net
[Download RAW message or body]

Hello

I have a text file with is encoding in Latin1 (ISO-8859-1). I can't change that as I \
do not create those files myself.

I have to read those files and convert the umlauts like ö to stuff like &oumol; as \
the text files should become html files.

I have this code:

#!/usr/bin/python
# -*- coding: latin1 -*-

import codecs

f = codecs.open('abc.txt', encoding='latin1')

for line in f:
    print line
    for c in line: 
        if c == "ö":
            print "oe"
        else:
            print c

and I get this error message:

$ ./read.py
Abc

./read.py:11: UnicodeWarning: Unicode equal comparison failed to convert both \
arguments to Unicode - interpreting them as being unequal  if c == "ö":
A
b
c

Traceback (most recent call last):
  File "./read.py", line 9, in <module>
    print line
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal \
not in range(128)

I checked the web and tried several approaches but I also get some strange encoding \
errors. Has anyone ever done this before? 
I am currently using Python 2.5 and may be able to use 2.6 but I cannot yet move to \
3.1 as many libs we use don't yet work with Python 3.

any help more than welcome.  This has been driving me crazy for two days now.

best wishes

Claus
-- 
Neu: GMX Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
für nur 19,99 Euro/mtl.!* http://portal.gmx.net/de/go/dsl02
-- 
http://mail.python.org/mailman/listinfo/python-list

[prev in list] [next in list] [prev in thread] [next in thread]