[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mason-devel
Subject:    [Mason-devel] line number patch
From:       John Tobey <john.tobey () isay ! com>
Date:       2000-10-11 22:30:06
[Download RAW message or body]

Dave Rolsky <autarch@urth.org> writes:

> On 6 Oct 2000, John Tobey wrote:
> 
> > Well, the patch below seems to work for the <%init> section.  It's not
> > finished; the output contains "#line XXX YYY" where XXX and YYY are
> > supposed to point back to the object file.  A postprocessor could fix
> > that up, I suppose.  Also, I'm not happy with inefficiency that I've
> > introduced.
> 
> I don't know that a post processor is needed.

Right, it should be integrated so that it's transparent to the user.
My point is that filling in XXX and YYY has an easy, albeit kludgey,
solution, so it's not prohibitively difficult.

>  Some considerations
> though:  You'd need to count lines inside of things like <%args> and
> <%perl> sections as well.  Its doable but...

I believe my patch counted such lines.  Actually, there was a bug that
sometimes missed a line; I'll include a revised patch (against
mason-0-8-branch) below.

I don't think the problem is as hard as you suggest.  There are two
places in the parser where pos($script) advances.  I count the
newlines in the matched text after each place.

> The parser is scheduled for a rewrite.  I don't think is intended to
> happen for the upcoming 1.0 release.  However, it is probably something
> that would be good to have for the next stable release after that.

> Let me point you to some parser discussion threads in the dev list
> archives:
> 
> 1.  http://marc.theaimsgroup.com/?l=mason-devel&m=87023871900908&w=2
> I summarize what I see as the possible proposals for a rewrite and
> critique all the ones that aren't mine.  totally biased.

Fair enough.  Are you ready for an off-the-cuff critique critique?
:-)  YAPP and RecDescent are powerful parsers, but their purpose is to
parse languages with hairy, difficult syntax.  Mason's goal should be
to avoid becoming syntactically hairy and requiring heavy parsing
machinery.  After all, we are not about to parse the Perl or HTML code
within components.

Being able to parse XML input would be cool, but that should be done
in an XML-parsing framework and should not affect the standard parser
that supports bad HTML, Perl, and God-knows-what.

> In any case, please join the dev list so we can discuss this further.

Done.

By the way, this piece of the patch is separable and belongs with my
regex metacharacter patch:

@@ -216,7 +235,7 @@
 	    $state->{$t} = '';
 	}
 
-	my $comp_names = join '|', @tags;
+	my $comp_names = join '|', map quotemeta, @tags;
 
 	# Use a scalar instead of a hash key to get at the script, to
 	# work around Perl 5.6 pos() returning undef after matches.

I suppose I could send it separately if you want.

-John

Index: lib/HTML/Mason/Parser.pm
===================================================================
RCS file: /cvsroot/mason/mason/dist/lib/HTML/Mason/Parser.pm,v
retrieving revision 1.73.4.2
diff -u -r1.73.4.2 Parser.pm
--- lib/HTML/Mason/Parser.pm	2000/10/10 23:47:38	1.73.4.2
+++ lib/HTML/Mason/Parser.pm	2000/10/11 22:27:42
@@ -149,6 +149,25 @@
     }
 
     #
+    # src_file => undef disables "#line" directives.
+    # src_file => FILE, lineno => NUMBER overrides default source info.
+    #
+    $state->{lineno} = defined($options{lineno}) ? $options{lineno} : 1;
+    if (exists($options{src_file})) {
+	$state->{src_file} = $options{src_file};
+    } else {
+	$state->{src_file} = $options{script_file};
+    }
+
+    #
+    # Avoid nastiness and security holes due to strange filenames.
+    #
+    if (defined($state->{src_file})) {
+	$state->{src_file} =~ s/\n/\\n/g;
+	$state->{src_file} =~ s/"/\\"/g;
+    }
+
+    #
     # Eliminate DOS ctrl-M chars
     #
     $state->{script} =~ s/\cM//g;
@@ -216,7 +235,7 @@
 	    $state->{$t} = '';
 	}
 
-	my $comp_names = join '|', @tags;
+	my $comp_names = join '|', map quotemeta, @tags;
 
 	# Use a scalar instead of a hash key to get at the script, to
 	# work around Perl 5.6 pos() returning undef after matches.
@@ -225,26 +244,29 @@
 	my $script = $state->{script};
 
 	while ( $script =~
-		/(                     # $1: the full tag match
+		/(.*?                 # $1: whole text for line counting
+                 (                     # $2: the full tag match
                   <%
                    (?:perl_)?          # optional perl_ prefix
-                   ($comp_names|       # $2: allowed tag names plus ...
+                   ($comp_names|       # $3: allowed tag names plus ...
                     (?:def|method)      # def or method followed by anything
                     ( [^>\n]* )         # that's not '>' or a newline
                                         # (which is the name)
-                                        # $3: subcomp or method name
+                                        # $4: subcomp or method name
                    )
                   >
-                 )/xigo
+                 ))/xigos
 	      )
 	{
-	    my $section_name = lc $2;
+	    my $whole_match = $1;
+	    $state->{lineno} += $whole_match =~ tr/\n/\n/;
+	    my $section_name = lc $3;
 	    $section_name = 'def' if substr($section_name,0,3) eq 'def';
 	    $section_name = 'method' if substr($section_name,0,6) eq 'method';
 
 	    my $section_start = pos($script);
-	    my $section_tag_pos = $section_start - length($1);
-	    my $subcomp_name = $3;
+	    my $section_tag_pos = $section_start - length($2);
+	    my $subcomp_name = $4;
 	    if (defined($subcomp_name)) {
 		for ($subcomp_name) { s/^\s+//; s/\s+$//; }
 	    }
@@ -259,8 +281,11 @@
 				   startline => $startline )
 		if $curpos < $section_tag_pos;
 
-	    if ($script =~ m/(<\/%(?:perl_)?$section_name>\n?)/ig) {
-		my $ending_tag = $1;
+	    if ($script =~ m/(.*?(<\/%(?:perl_)?\Q$section_name\E>\n?))/igs)
+	    {
+		my $whole_match = $1;
+		$state->{lineno} += $whole_match =~ tr/\n/\n/;
+		my $ending_tag = $2;
 		my $section_end = pos($script) - length($ending_tag);
 		my $section = substr($script, $section_start, $section_end - $section_start);
 		my $method = '_parse_' . lc $section_name . '_section';
@@ -470,8 +495,16 @@
     my %params = @_;
 
     my $state = $self->{parser_state};
+    my $section = $params{section};
+
+    if (defined($state->{src_file})) {
+	$state->{init} .= qq(#line );
+	$state->{init} .= $state->{lineno} - $section =~ tr/\n/\n/ - 1;
+	$state->{init} .= qq( "$state->{src_file}"\n);
+	$section .= "\n#line XXX YYY";
+    }
 
-    $state->{init} .= $params{section}."\n";
+    $state->{init} .= $section."\n";
 }
 
 sub _parse_cleanup_section
@@ -1227,7 +1260,7 @@
 	return if (!-f $srcfile);
 	return if defined($predicate) && !($predicate->($srcfile));
 	my $compPath = substr($srcfile,length($source_dir));
- 	(my $objfile = $srcfile) =~ s@^$source_dir@$object_dir@;
+ 	(my $objfile = $srcfile) =~ s@^\Q$source_dir\E@$object_dir@;
 	my ($objfiledir) = dirname($objfile);
 	if (!-d $objfiledir) {
 	    if (defined($dir_create_mode)) {
@@ -1255,7 +1288,7 @@
 		if ($verbose) {
 		    print "error";
 		    if ($error_dir) {
-			(my $errfile = $srcfile) =~ s@^$source_dir@$error_dir@;
+			(my $errfile = $srcfile) =~ s@^\Q$source_dir\E@$error_dir@;
 			$self->write_object_file(object_file=>$errfile, object_text=>$objText);
 			print " in $errfile";
 		    }
_______________________________________________
Mason-devel mailing list
Mason-devel@lists.sourceforge.net
http://lists.sourceforge.net/mailman/listinfo/mason-devel

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic