[prev in list] [next in list] [prev in thread] [next in thread] 

List:       python-list
Subject:    String multi-replace
From:       Sorin Schwimmer <sxn02 () yahoo ! com>
Date:       2010-11-18 4:21:06
Message-ID: 114148.34295.qm () web56007 ! mail ! re3 ! yahoo ! com
[Download RAW message or body]

Hi All,

I have to eliminate diacritics in a fairly large file.

Inspired by http://code.activestate.com/recipes/81330/, I came up with the following code:

#! /usr/bin/env python

import re

nodia={chr(196)+chr(130):'A', # mamaliga
       chr(195)+chr(130):'A', # A^
       chr(195)+chr(142):'I', # I^
       chr(195)+chr(150):'O', # OE
       chr(195)+chr(156):'U', # UE
       chr(195)+chr(139):'A', # AE
       chr(197)+chr(158):'S',
       chr(197)+chr(162):'T',
       chr(196)+chr(131):'a', # mamaliga
       chr(195)+chr(162):'a', # a^
       chr(195)+chr(174):'i', # i^
       chr(195)+chr(182):'o', # oe
       chr(195)+chr(188):'u', # ue
       chr(195)+chr(164):'a', # ae
       chr(197)+chr(159):'s',
       chr(197)+chr(163):'t'
      }
name="R\xc3\xa2\xc5\x9fca"

regex = re.compile("(%s)" % "|".join(map(re.escape, nodia.keys())))
print regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], name)

But it won't work; I end up with:

Traceback (most recent call last):
  File "multirep.py", line 25, in <module>
    print regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], name)
  File "multirep.py", line 25, in <lambda>
    print regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], name)
TypeError: 'type' object is not subscriptable

What am I doing wrong?

Thanks for your advice,
SxN


-- 
http://mail.python.org/mailman/listinfo/python-list
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic