Thanks a lot. I will try that on the weekend. Claus > Claus Hausberger wrote: > > Thanks a lot. Now I am one step further but I get another strange error: > > > > Traceback (most recent call last): > > File "./read.py", line 12, in > > of.write(text) > > UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in > position 0: ordinal not in range(128) > > > > according to google ufeff has something to do with byte order. > > > > I use an Linux system, maybe this helps to find the error. > > > 'text' contains Unicode, but you're writing it to a file that's not > opened for Unicode. Either open the output file for Unicode: > > of = codecs.open("umlaut-out.txt", "w", encoding="latin1") > > or encode the text before writing: > > text = text.encode("latin1") > > (I'm assuming you want the output file to be in Latin1.) > > > > >> Claus Hausberger wrote: > >> > >>> I have a text file with is encoding in Latin1 (ISO-8859-1). I can't > >>> change that as I do not create those files myself. I have to read > >>> those files and convert the umlauts like ö to stuff like &oumol; as > >>> the text files should become html files. > >> umlaut-in.txt: > >> ---- > >> This file is contains data in the unicode > >> character set and is encoded with utf-8. > >> Viele Röhre. Macht spaß! Tsüsch! > >> > >> > >> umlaut-in.txt hexdump: > >> ---- > >> 000000: 54 68 69 73 20 66 69 6C 65 20 69 73 20 63 6F 6E This file is > con > >> 000010: 74 61 69 6E 73 20 64 61 74 61 20 69 6E 20 74 68 tains data in > th > >> 000020: 65 20 75 6E 69 63 6F 64 65 0D 0A 63 68 61 72 61 e > unicode..chara > >> 000030: 63 74 65 72 20 73 65 74 20 61 6E 64 20 69 73 20 cter set and > is > >> 000040: 65 6E 63 6F 64 65 64 20 77 69 74 68 20 75 74 66 encoded with > utf > >> 000050: 2D 38 2E 0D 0A 56 69 65 6C 65 20 52 C3 B6 68 72 -8...Viele > R..hr > >> 000060: 65 2E 20 4D 61 63 68 74 20 73 70 61 C3 9F 21 20 e. Macht > spa..! > >> 000070: 20 54 73 C3 BC 73 63 68 21 0D 0A 00 00 00 00 00 > Ts..sch!....... > >> > >> > >> umlaut.py: > >> ---- > >> # -*- coding: utf-8 -*- > >> import codecs > >> text=codecs.open("umlaut-in.txt",encoding="utf-8").read() > >> text=text.replace(u"ö",u"oe") > >> text=text.replace(u"ß",u"ss") > >> text=text.replace(u"ü",u"ue") > >> of=open("umlaut-out.txt","w") > >> of.write(text) > >> of.close() > >> > >> > >> umlaut-out.txt: > >> ---- > >> This file is contains data in the unicode > >> character set and is encoded with utf-8. > >> Viele Roehre. Macht spass! Tsuesch! > >> > >> > >> umlaut-out.txt hexdump: > >> ---- > >> 000000: 54 68 69 73 20 66 69 6C 65 20 69 73 20 63 6F 6E This file is > con > >> 000010: 74 61 69 6E 73 20 64 61 74 61 20 69 6E 20 74 68 tains data in > th > >> 000020: 65 20 75 6E 69 63 6F 64 65 0D 0D 0A 63 68 61 72 e > unicode...char > >> 000030: 61 63 74 65 72 20 73 65 74 20 61 6E 64 20 69 73 acter set and > is > >> 000040: 20 65 6E 63 6F 64 65 64 20 77 69 74 68 20 75 74 encoded with > ut > >> 000050: 66 2D 38 2E 0D 0D 0A 56 69 65 6C 65 20 52 6F 65 f-8....Viele > Roe > >> 000060: 68 72 65 2E 20 4D 61 63 68 74 20 73 70 61 73 73 hre. Macht > spass > >> 000070: 21 20 20 54 73 75 65 73 63 68 21 0D 0D 0A 00 00 ! > Tsuesch!..... > >> > >> > >> > >> > >> > >> -- > >> "The ability of the OSS process to collect and harness > >> the collective IQ of thousands of individuals across > >> the Internet is simply amazing." - Vinod Valloppillil > >> http://www.catb.org/~esr/halloween/halloween4.html > > > > -- > http://mail.python.org/mailman/listinfo/python-list -- Neu: GMX Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate für nur 19,99 Euro/mtl.!* http://portal.gmx.net/de/go/dsl02 -- http://mail.python.org/mailman/listinfo/python-list