'Re: [BusyBox] Re: [patch] Simplify the heck out of the sed newline'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       busybox
Subject:    Re: [BusyBox] Re: [patch] Simplify the heck out of the sed newline
From:       Rob Landley <rob () landley ! net>
Date:       2003-09-26 3:48:18
[Download RAW message or body]

On Thursday 25 September 2003 20:34, Glenn McGrath wrote:
> On Thu, 25 Sep 2003 15:26:59 -0500
>
> Rob Landley <rob@landley.net> wrote:
> > Here's a test program to show that glibc regex is matching patterns
> > containing newlines just fine.  (I haven't tested uclibc regex, but if
> > it doesn't work and glibc does, that's a bug and I'll fix it.)
> >
> > int main(int argc, char *argv[])
> > {
> >   regex_t regex;
> >   regmatch_t match;
> > //  char grepstr[]="fred[123]ish";
> > //  char *string="abcfred2ishdef";
> >   char *grepstr="oom\nping";
> >   char *string="thingoom\npingb";
> >
> >   printf("%d\n",regcomp(&regex, grepstr, REG_NEWLINE));
> >   printf("%d\n",regexec(&regex,string,1,&match,0));
> >   printf("%d %d\n",match.rm_so,match.rm_eo);
> > }
>
> Interesting, its definetly a bug in sed.c then, and the current newline
> hack is the wrong approach.
>
>
> Glenn

I'm banging on it.

I started reading the spec again from the beginning, which resulted in me 
writing up a list of tests, which resulted in me writing sedtests.py, which 
is a little python script that runs sed tests.  Here's what I have so far...

I've ripped out the newline hack entirely in my tree, and I've got a test for 
newline behavior.  Right now I'm getting test cases from the spec, and late 
(since I have to re-read a lot of the code anyway) I hope to get test cases 
from the code.  THEN, I intend to get test cases from the binutils build, and 
from previous emails you've sent me.

I just changed a lot of code (simplying the heck out of the big sedding loop), 
and I'll just about guarantee you I broke something.  Time for a regression 
test harness, then. :)

Rob
["sedtests.py" (text/x-python)]

#!/usr/bin/python

verbose=0

tests=(

# Testing address ranges

	# Test one numeric address.
	("a\nb\nc\n","-n", "b\n", "2p"),

	# Test one regexp address (with two matches).
	("a\nb\nc\nb\nd\n", "-n", "2\n4\n", "/b/="),

	# Test $ address.
	("a\nb\nc\n", "-n", "c\n", "$p"),

	# Test numeric address range
	("a\nb\nc\nd\ne\n", "-n", "a\nb\nc\n", "1,3p"),

	# Test regexp pair address range
	("a\nb\nc\nd\ne\n", "-n", "b\nc\nd\n", "/b/,/d/p"),

	# Test regexp pair address range with two matches
	("a\nb\nc\nd\nb\ne\nd\nf", "-n", "b\nc\nd\nb\ne\nd\n", "/b/,/d/p"),

	# Test reversed numeric address range
	("a\nb\nc\nd\ne\n", "-n", "b\nc\nd\ne\n", "/b/,/a/p"),

	# Test regexp with numeric
	("a\nb\nc\nd\ne\n", "-n", "b\nc\nd\n", "/b/,4p"),

	# Test regexp with lower numeric
	("a\nb\nc\nd\ne\n", "-n", "b\n", "/b/,1p"),

	# Test reversed regexp pair (same as no second match)
	("a\nb\nc\nd\ne\n", "-n", "c\nd\ne\n", "/c/,/b/p"),

	# Test regexp and $
	("a\nb\nc\n", "-n", "b\nc\n", "/b/,$p"),

	# Test $ and regexp
	("a\nb\nc\n", "-n", "c\n", "$,/b/p"),

	# Test number and $
	("a\nb\nc\n", "-n", "a\nb\nc\n", "1,$p"),

	# Test $ and number
	("a\nb\nc\n", "-n", "c\n", "$,2p"),

# Regular Expressions

# Regular expressions cah show up in addr1, addr2, and subst.

	# Test backslash-escape delimited regexp range
	("a\nb\nc\nd\ne\n", "-n", "b\nc\nd\n", "\\,b,,\\ d p"),

	# Test backslash newline in substitution regexp.
	("a\nb\nc\nd\n", "-n", "bfooc\n", "2{N;s/\\n/foo/p;}"),

	# Test backslash newline in substitution target
	("a\nb\nc\n", "", "a\n123\n456\nc\n", "s/b/123\\n456/"),

# Gnuisms

	# Test if the last line of input has no newline that the last line
	# of output has no newline.  (Violates the spec: "Whenever the pattern
	# space is written to standard output or a named file, sed shall
	# immediately follow it with a <newline>."
	("a\nb", "-n", "b", "2p")

# Notes:
	# Edge case: you can't use an embedded newline as a backslash-escape
	# delimiter.  I.E.  ("a\nb\nc\n", "-n", "b\n", "\\\\nb\\np") fails.
	# (Because scan of pattern space parses \\ as escaped backslash.)
	# No human being should ever care.  Don't do that then.
      )

import os, sys

def shell(command, stdin="", discardstderr=0):
	"""Shell out and capture stdout input."""

	io=os.popen3(command)
	if len(stdin): io[0].write(stdin)
	io[0].close()
	retval=io[1].read()
	io[1].close()

	err=io[2].read()
 	if not discardstderr: sys.stderr.write(err)

	return retval

count=0
for i in tests:
	count=count+1
	command="sed %s -e '%s'" % (i[1],"' -e '".join(i[3:]))
	result=shell(command,i[0])
	fail=(result!=i[2])
	sys.stdout.write("test %s: " % count)
	if verbose or fail:
		sys.stdout.write("\ndata:\n%s\ncommand: %s\n" % (i[0],command))
		sys.stdout.write("expected:\n%s" % i[2])
		sys.stdout.write("result:\n%s" % result)

	if fail: print "Fail"
	else: print "Pass"


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic