[prev in list] [next in list] [prev in thread] [next in thread] 

List:       python-list
Subject:    Re: String multi-replace
From:       Chris Rebert <clp2 () rebertia ! com>
Date:       2010-11-18 4:45:49
Message-ID: AANLkTikVd-qN1q5+vniP1BytAN2QwTKi3MY+itZTve98 () mail ! gmail ! com
[Download RAW message or body]

On Wed, Nov 17, 2010 at 8:21 PM, Sorin Schwimmer <sxn02@yahoo.com> wrote:
> Hi All,
>
> I have to eliminate diacritics in a fairly large file.
>
> Inspired by http://code.activestate.com/recipes/81330/, I came up with the following code:
>
> #! /usr/bin/env python
>
> import re
>
> nodia={chr(196)+chr(130):'A', # mamaliga
>       chr(195)+chr(130):'A', # A^
>       chr(195)+chr(142):'I', # I^
>       chr(195)+chr(150):'O', # OE
>       chr(195)+chr(156):'U', # UE
>       chr(195)+chr(139):'A', # AE
>       chr(197)+chr(158):'S',
>       chr(197)+chr(162):'T',
>       chr(196)+chr(131):'a', # mamaliga
>       chr(195)+chr(162):'a', # a^
>       chr(195)+chr(174):'i', # i^
>       chr(195)+chr(182):'o', # oe
>       chr(195)+chr(188):'u', # ue
>       chr(195)+chr(164):'a', # ae
>       chr(197)+chr(159):'s',
>       chr(197)+chr(163):'t'
>      }
> name="R\xc3\xa2\xc5\x9fca"
>
> regex = re.compile("(%s)" % "|".join(map(re.escape, nodia.keys())))
> print regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], name)

Have you considered using string.maketrans() and str.translate()
instead? It's simpler and likely faster than generating+using regexes
like that.
http://docs.python.org/library/string.html#string.maketrans

Cheers,
Chris
--
Cue someone quoting Zawinski.
http://blog.rebertia.com
-- 
http://mail.python.org/mailman/listinfo/python-list

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic