'Fun with embedded NUL bytes in sed.'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       busybox
Subject:    Fun with embedded NUL bytes in sed.
From:       Rob Landley <rob () landley ! net>
Date:       2006-02-26 6:27:32
Message-ID: 200602260127.32559.rob () landley ! net
[Download RAW message or body]

The gnu sed implementors are sneaky.

So I've been banging my head against the wall trying to figure out how to 
properly handle embedded nul bytes in the sed data, when regexec() works on a 
nul-terminated string.  Take these test cases, for example.  Here's what gnu 
sed produces when faced with nul bytes in its input:

echo -e "\0woo" | sed "s/woo/thingy/"
thingy
echo -e "woo\0woo" | sed "s/woo/thingy/"
thingywoo
echo -e "woo\0woo" | sed "s/woo/thingy/g"
thingythingy

Did they implement their own built-in regex handing that takes a length 
argument of the string to work against instead of stopping at a nul 
terminator?  (Because busybox sed ain't doing that.  Too big.)

I mean I could fake it with something like:

while(strlen(blah)<len) {
  do_regexec_stuff(blah);
  len-=strlen(blah)+1;
  blah+=strlen(blah)+1;
}

But that means a regex could never actually match a string section containing 
a nul byte.  Not even with wildcards.

But then I tested whether or not gnu sed actually does that either, and guess 
what?
echo -e "abcwoo\0woodef" | sed "s/woo.woo/thingy/"
abcwoowoodef
echo -e "walrus\0woowallwoo" | sed "s/wal.*/thingy/"
thingywoowallwoo

It doesn't handle that.  They're faking it themselves.

That makes life _much_ easier.  Right.  I can implement that...

Rob
-- 
Never bet against the cheap plastic solution.
_______________________________________________
busybox mailing list
busybox@busybox.net
http://busybox.net/cgi-bin/mailman/listinfo/busybox
[prev in list] [next in list] [prev in thread] [next in thread]