[prev in list] [next in list] [prev in thread] [next in thread]
List: 9fans
Subject: [9fans] =?iso-8859-1?q?gr=EBp_=28rhymes_with_creep=29_and_cptmp?=
From: Jason Catena <jason.catena () gmail ! com>
Date: 2009-11-29 19:01:33
Message-ID: d50d7d460911291101k7420eb0fna61f87646606e991 () mail ! gmail ! com
[Download RAW message or body]
I wrote a wrapper around grep to search for words regardless of
accents. I didn't want to worry about whether I used accents on
characters (I sometimes use them inconsistently, and others decidedly
do), but I still wanted to limit the results to exact matches if I
supplied an accent. Here's an example run.
$ grep facade word
treatment <a museum's east facade>. A false, superficial, or artificial
$ grëp facade word
89: to bow to man. façade. circa 1681. French façade, from Italian
92: treatment <a museum's east facade>. A false, superficial, or artificial
$ grëp façade *
style:21: crucial difference to pronunciation: cliché, soupçon, façade, café,
wabisabi:51: or the crumbling stone façade of an old building. Transience,
word:89: to bow to man. façade. circa 1681. French façade, from Italian
Note that line word:92 (output by the second command) is not output by
the third command, since I supplied an accent on that particular
character (ç) in my input pattern. I chose the umlaut or diæresis to
remind me that grëp provides the -n option by default, so I'll get a
line number and : in the output. (I should probably just pass through
all of grep's command-line options.)
<grëp>=
#!/usr/local/plan9/bin/rc
regex=$1
shift
classes=`{cptmp classes}
sed '/-/d;s,^\[(.),s/\1/\[\1,;s,$,/g,' charclass > $classes
grep -n `{echo $regex | sed -f $classes} $*
I translate each ordinary latin character in the input pattern (eg
[0-9A-Za-z]) into a character class (the attached charclass file,
which doesn't cut-and-paste well), and then call grep with the updated
pattern. The first sed command in grëp turns the character classes in
charclass into s commands for sed. The charclass file contains the
square brackets because I also use it to cut-and-paste from when I
need a character class for a sed script.
The script cptmp creates a temporary copy of an existing file, or a
temporary new file.
<cptmp>=
#!/usr/local/plan9/bin/rc
flag e +
if(~ $#TMPDIR 0)
TMPDIR=/tmp
base=`{basename $1}
tmp=$TMPDIR/$base.$USER.$pid
if (test -f $1) {
cp -pr $1 $tmp
}
if not {
touch $tmp
}
chmod +wx $tmp
echo $tmp
Jason Catena
["charclass" (application/octet-stream)]
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic