[prev in list] [next in list] [prev in thread] [next in thread] 

List:       9fans
Subject:    [9fans] =?iso-8859-1?q?gr=EBp_=28rhymes_with_creep=29_and_cptmp?=
From:       Jason Catena <jason.catena () gmail ! com>
Date:       2009-11-29 19:01:33
Message-ID: d50d7d460911291101k7420eb0fna61f87646606e991 () mail ! gmail ! com
[Download RAW message or body]


I wrote a wrapper around grep to search for words regardless of
accents.  I didn't want to worry about whether I used accents on
characters (I sometimes use them inconsistently, and others decidedly
do), but I still wanted to limit the results to exact matches if I
supplied an accent.  Here's an example run.


$ grep facade word
treatment <a museum's east facade>.  A false, superficial, or artificial

$ grëp facade word
89: to bow to man. façade. circa 1681.  French façade, from Italian
92: treatment <a museum's east facade>.  A false, superficial, or artificial

$ grëp façade *
style:21: crucial difference to pronunciation: cliché, soupçon, façade, café,
wabisabi:51: or the crumbling stone façade of an old building.   Transience,
word:89: to bow to man. façade. circa 1681.  French façade, from Italian


Note that line word:92 (output by the second command) is not output by
the third command, since I supplied an accent on that particular
character (ç) in my input pattern.  I chose the umlaut or diæresis to
remind me that grëp provides the -n option by default, so I'll get a
line number and : in the output.  (I should probably just pass through
all of grep's command-line options.)


<grëp>=
#!/usr/local/plan9/bin/rc

regex=$1
shift

classes=`{cptmp classes}
sed '/-/d;s,^\[(.),s/\1/\[\1,;s,$,/g,' charclass > $classes

grep -n `{echo $regex | sed -f $classes} $*


I translate each ordinary latin character in the input pattern (eg
[0-9A-Za-z]) into a character class (the attached charclass file,
which doesn't cut-and-paste well), and then call grep with the updated
pattern.  The first sed command in grëp turns the character classes in
charclass into s commands for sed.  The charclass file contains the
square brackets because I also use it to cut-and-paste from when I
need a character class for a sed script.

The script cptmp creates a temporary copy of an existing file, or a
temporary new file.


<cptmp>=
#!/usr/local/plan9/bin/rc
flag e +

if(~ $#TMPDIR 0)
	TMPDIR=/tmp
base=`{basename $1}
tmp=$TMPDIR/$base.$USER.$pid

if (test -f $1) {
	cp -pr $1 $tmp
}
if not {
	touch $tmp
}
chmod +wx $tmp
echo $tmp


Jason Catena

["charclass" (application/octet-stream)]

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic