'Perl hl - here is the test.pl'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kwrite-devel
Subject:    Perl hl - here is the test.pl
From:       Anders Lund <anders () alweb ! dk>
Date:       2002-03-07 2:02:49
[Download RAW message or body]

["test.pl" (text/x-perl)]

#!/usr/bin/perl
# this is a comment
# ddd this is more's "comment
# Actually, this is a test file for the kate perl hl file.
package katesyntaxhltest;

# pod support :-)
=head1 I Love Perl

because it has a module that does exactly nothing - except
for stating that it may, someday... - beeing called less,
and loaded using the use operator.
If the perldoc: protocol has been released (and installed), type
"perldoc:less" in your konqueror address field for the full scope:))

=head1 DESCRIPTION

This script is generated during the writing of the perl.xml file, for
the purpose of testing if it works. It is of interrest ONLY if you have
interrest in the problems with kate highlight files, and probably only
accompagnied by the perl.xml.README and the perl.commented.xml file,
which is there for the same purpose.

I have added comments about the perl statements, so you can know why they
are there and sometimes comments about the trouble they produce.

If you want the full advantage, you should set colors for the following
perl types in the highlight page of the Kate configuration dialog: (in
addition to the default definitions)
String
Interpolated string
Special Variable
Function

=cut

use less;
$AUTHOR = q-Anders Lund, anders @alweb.dk-;
$VERSION = 0.1;

die "this is a sample file\n";

# code from here!
# NOTE every comment here is not nessecarily true:) TODO: fix that...
sub foo {
  return 1;
}

### Built in functions
chomp, print (split /\s+/, $_)[0] while (<__DATA__>);

### File tests too :-)
die qq{WACK!! \n} unless (-e '/usr/bin/perl' && -x _ or
                            -e qx[which perl] && -e _ );

### Special variables
$^H;       # (Internal storage of compile-time hints)
$*;        # ($MULTILINE_MATCHING, take a guess)
$_;        # (the "default" variable)
$_private; # Should not be not affected :-p
$__foo;    # :( FIXME
$#;        # (output format for printed numbers)
$/;        # You should be familliar with this one...
$@, $,, $;;# ...
$#ary;     # :-)
$123;      # If you have a regex that defines that one, i'll buy you a beer :)
@_incoming = @_; # Hehe.
@ARGV;
$firstarg = $ARGV[0];

# Not capable of seperating vars defined with similar names,
# of cause.
$ARGV = "probably legal, but you don't do that, do you?";
@INC;
@INCOOPERATED;    # :-))

%ENV, %SIG, %INC; # Yops, we have'm all :-)
$ENVCHECK = $ENV{ HOME } ne $SIG{__DIE__}; # hopefully!
die "Grow UP, pal!\n" unless ($^O =~ /(x|bsd)$/i);

# use English is not supported :( (just a q of adding a list...)

$ARG, $INPUT_RECORD_SEPERATOR;

# a simple function difinition and call
# name ending in "m" to make sure we don't assume m().
arm($things, 1);
# Doesn't happen if there's enough of them (3 is enough) :)
sub FindOption ($$$$$$$);

# Though, eg "$;" is a defined variable, whereas in
# a sub definition ($;$) the semicolon tells us that the second
# scalar is optional. There is a few similar cases. :-}

### Variables
$scalar;
@list;
%hash;
$quirq{'foo'};

$hashref->{ $part }->{"key"};
$aryref->[$subscript];

${foobar} == $foobar;
${$foobar} == "Something else";
$foo3x;
$foo4;

$hash{"key"};

# Unquoted hash element keys are rendered as simple strings.
# The detection is recursive, enabeling deep elements.
# Space between the elements is legal, but honestly, it's ugly style.
$hash{element};
$hash{ element }; # Good style:)

$hash{element}{element};
$hash { element } { element }; # Eeew

$hashref->{element};
$hashref -> {element} -> { element }; # Ydrk!!

$hashref->{ element }->{ $element };
$hashref->{ $element }->{ $element };
@{$hashref->{ key }} = [ @list ];

# Function calls in a hash element breaks the variable
# detection loop.
$hash{key_finder(@args)}{otherkey};
$hash{otherkey}{key_finder(@args)};
$arg;
# The first part of a variable is allowed to be protected by {}
# - at the cost that the first "{" and "}" is colored as variables.
${hash_ref}->{key};
@{array};
%{hash};

# This is legal theese days:
$hashref->{element}{element};

$hashref->{"jason"}{ cohen };
$hashref->{"anders"}->{lund};
$hashref->{"equal"}->{"style"};

$hash{$1};              # A fairly common
$hashref->{ $_ };       # usage of special
$hashref->{$ARGV[0]};   # variables is inside
$hash{$_[1]};           # hash elements or
$array[$_];             # array subscripts

# The variable detection eats method calls qualified
# by an object variable. Regretable - maybe?? - could be
# fixed if there's parens, otherwise not so easy..
# (NOTHING can't be fixed :-])
$dbmobj->FETCH($data);
$obj->method();
$obj->method;

# Some pattern constructs inside hash element keys are gobbled up
# as long as we can't share contexts. Do *you* use such constructs?
$hashref->{/pattern/};          # pattern not detected
$hashref->{$foo =~ s/bar/baz/}; # works, the variable detection
                                # is broken by the "=".
$hashref->{($foo = $goo) =~ s/bar/baz/}; # fine
$hashref->{m(google)};          # pattern not detected, but no bad break
                                # because the ( breaks the variable detection

$hashref->{m/foobar/};           # Ooops.
$hashref->{(/bar/)};             # pattern not detected, no bad break

# stupid way of dereferencing
$$ref;
$$$ref;
$$$$ref;

$ary[19];
"legal, but silly string";
$_[19] = "/usr/share/doc/HTML/index.html";
$path = "/bin:/usr/bin"; #"#
@list;    # perl's own thingy
%foo{'bar'}; # that one too :)
my ($a, $b, $c);
$#foo;
# Fully qualyfied variable name :)
$package::var::name;
# Old style qualification, do not do this, it is back from perl 4 days!
$package'thingy; ####'

# while this just breaks, but ok, still illegal:
# Some common variable constructs, often used within strings, but also
# in normal code. I don't detect these in strings just yet, whenever
# a context can be shared - which means that a BACK keyword can be given
# to individual rules - this will be possible.
${$variable};
${"$variable.\$$othervariable"};
${"$variable\::something"}; # seen in DBI
@{$ref->{'key'}} = [ qw(foo bar baz) ];
%{$package::name};

$test[3], $test;

# the letters [smqxwr] may denote quoting/pattern/ constructs.
$arm{$shoulder};
$arm[0];
$ary[$num];
$ary{$thing};
$ars[0];
$ars{hole};
$arq[0];
$arq{"etype"};
$arw[0];
$arw($arg);
$arx[0];
$arx{'spelling!'};
$ar[0];
$ar{ray};
# :-)

### Quote like constructs.
# The first one fails, as it is not
# checked for nested *paren sets :(
$re = qr{$foo[abc]\w{5} \d+};
# works here :-)
$subst = s{\w+ (\d{2})}{$1}g;

$re = qr/something/;
@my_processes = qx/ps -aux/;
$re3 = qr(\W$foo);

# Find cheap stuff for the British :)
$re = qr£1\.50£;

# qw// lists are rendered as "normal text", because the lack of
# a resonable checking of all delimiters, meaning \W - any non-word
# non-digit character, including $@%;!. It may be better to render it
# as a interpolated string, when i get to write that method in the
# hl object...
@varlist = qw/$foo $bar $baz/;
@wordlist = qw/this is a list of
               words seperated by
               some amount of space/;
print join(" ", @wordlist);
# this is a list of words seperated by some amount of space

### various PATTERN constructs
# a blank /PATTERN/ :-)))
   /PAT/;
$string =~ tr/abc/d/Dc;
$string =~ y!/!#!;
# (extended parsing of [m]// with slashes only, see below)
$res =~ m/foo\d{3,6}(\W+)[6u\s]*/;

$all_non_word_chars = $1;
$res = m(MATCHED);
$res = m{PATTERN};
$res = m[PATTERN];
$res = m/PATTERN/;
$res = m,PATTERN,;
$stars =~ s[(\*)][$1];

# In // patterns, and in supported multiline patterns, variables
# will be detected, if they are not escaped with a \\
/foo$bar\$baz/;
/foo$bar\/$baz/;
# "\" is escaped too
/foo$bar\\$baz/;

# editing perl.xml in Kate, using the filter plugin
# with perl -e when I discovered a forgotten context:
s/"(\d+)"/q!"!.($1>4?$1-1:$1).q!"!/eg;
# (then safely remove old unused context 4)

# Theese are mostly legal constructs. I *did* "use less", right?
# here, they represent places where a pattern or "quote like
# delimiter" is OK.
$foo, /$bar/g;
$foo & (s/bar/baz/i);
$foo & q{string};       # also legal :-P
$foo << m,bar,;
$foo *= m/bar/;         # $foo = 0 unless /bar/;
$foo /pattern/;         #/# ok, illegal.
$foo m/pattern/;        # Illegal, slips through.
(/$pattern/);
(m/$pattern/);
( q{string} );

if (@files = qx/ls -l/) { print qq$Hey! "qx//" works :-)\n$ };
$_ = "you are in " . qx/pwd/, print;
                         # print working direcotry in a wack way :)

# Detecting s/// constructs in perl.xml
# See below for the problem with this approach - and tell me a way
# to solve it :=P
/(?:s|tr|y)([^a-zA-Z0-9_\s[\]{}()]).*\1.*\1/;

# s/// patterns with fun delimiters
if (s,foo,bar,i) {
  s!/file!/name!;
  s#/##;
  s;foo;bar;; print;
}
$_ =~ s/foo#/bar/;

### Experimental detailed PATTERN parsing
# [m]// patterns has internal regex pattern operator detection.
# It detects character classes, quantifyers and assertions - and extensions

    /\d{2} [a-zA-Z_]\s*[\w\s]+?$/;
    /\Q$foobar\E:\s+(Linux|\w+?BSD)\s+(\w*)\s+\(\d\d\.\d\d\)$/;

# The above is definately easier to interpret than the below:

    m%\d{2} [a-zA-Z_]\s*[\w\s]+?$%;
    m!\Q$foobar\E:\s+(Linux|\w+?BSD)\s+\w*\s+\(\d\d\:\d\d\)$!;

# POSIX classes, {}+*? quantifiers
/[:alpha:]*[:punct:]\s?\w+\s*\d{1,3}/;
# Negated classes
/[^\w\s{}()[\]<>][:^cntrl:]/;
# Character class with embedded POSIX class from the perlre manpage
/[01[:alpha:]%]/; # zero, one, any alphabetic character or %
# Negating the embedded POSIX class fails :(
/[\w+[:^graph:]]/;

# Perl regex extensions :-)) This checked
# by asking for "(?" after backslash escape, it will probably work -
# "?" after an unescaped "(" is an error unless followed by
# one of :=!<>{. The code extensions is un-accounted for, and may
# look weird

$ok++ if /\d+?(?#beer only in bar)(?<=bar\s)beer(?# hehe)/x;
# Samples from the perlre manpage
    / (?> \# [ \t]* ) (        .+ ) /x;
    /     \# [ \t]*   ( [^ \t] .* ) /x;
    /(?>\#[ \t]*)(        .+)/x;
    /#[ \t]*(?![ \t])/x;
# Code - does not work well at this point!
# Code may need a paren matching feature to work safe, and there
# are other problems, such as getting sufficient rules :(
/(??{ $i = 0 }) ... /;

# I chose to enable comments, using /\s{3,}#.*$/, which mostly works:
/^some text ..  #.*$/;
# The string I get at this point of Kate development does not
# seem to know about EOL, hopejully we can get a better check later :-]

# Here is a really cool one from Text::ParseWords:

	($quote, $quoted, undef, $unquoted, $delim, undef) =
	    $line =~ m/^(["'])                 # a $quote
                        ((?:\\.|(?!\1)[^\\])*)   # and $quoted text
                        \1 		       # followed by the same quote
                        ([\000-\377]*)	       # and the rest
		       |                       # --OR--
                       ^((?:\\.|[^\\"'])*?)    # an $unquoted text
		      (\Z(?!\n)|(?-x:$delimiter)|(?!^)(?=["']))
                                               # plus EOL, delimiter, or quote
                      ([\000-\377]*)	       # the rest
		      /x;		       # extended layout

    # I have a problem with comment-ONLY lines in
    # patterns !?!
/
  (?:           # non cpaturing group
     [^()]+     # of non-parens
   # comment &%¤##& buckled SHIT
# WACK!!! All space is cut of the string before
# passing it to context rules, disabling
# the truthvalue of linestart by mistake :( (katehighlight.cpp, 788.)
   |            # or
     \)
   )
/x;

# Get rid of emacs leftovers now that we have nice perl syntax hl in kate :-)
while (<DIR>) { unlink if /(?#possible "\"??)^\\?#[\w]+\.(?:pl|pm)#$/ }
$foo[$tal << /pat/];     # beutiful!
$foo{/bar/ eq 'Silly'};  # yopper, perl won't complain
                         # (as long as $_ is defined) ;^)
                         # The pattern is unfortunately eaten by the
                         # variable detection.
# The one supported multiline s/// uses pairs of braces:
s{
   \$foo$bar
 }
 {
   $bar
 }x;

# The /x is assumed, as the hl implementation chews the text
# top down. When we can tjeck that, we can make the editor beep
# when invalid code is entered, which will remind me how much i
# hated the VB editor untill i digged the switch out of the
# settings... :-\

# Calculation statements including "/".
# The problem: do not start a "/PATTERN" at the "/". This gave
# me a few new grey hairs, I solved it by chewing up \s*[)}\]]?\s*/ or
# one single character after any numeric character, a similar technique
# is also used on variables. I can change it if I can get the Operators
# list to work...
$calc = 22 / $foo;
$calc = 22 / 7;
$calc = $1 / 200;
$calc = $foo / 100 / 1;
$calc = $size / 1024 / 1024;
$calc = 1.559 / 0xAf / 14;
$calc = $foo++ / 10;
$calc = ( 3 + $foo ) / 11;
$calc = $foo{'bar'} / 44;
$calc = $foo{$bar}/44;
$calc = (2 + $foo{'bar'} ) /44;
$calc = $foo[11] / 44;
$calc = (2+$foo[12]) / 44;
$calc = ($foo-- / 100) / 11;
$values{"foo"} /= 2;
$values{foo} /= 2;

$foo and $bar;

$foo if $bar;

$number /= 4;
$foo / $bar;
Package::Method("arg") / 1;

# there is a constant pragma too, folks using that one may
# be likely to have a PI and a E amongst others...
DEFINED_THINGY / 3;

# Did all kinds of numbers survive?
$oct = 0644;
$hex = 0xAf;
$float = 3.14;
# Math::Complex offers a function name ending in a integer
	my $theta_inc = pit2 / $n;
	my $rho = $r ** (1/$n);
# :-) Gotya

# String constructs.
# Perl supports two kinds of litteral
# strings, a dum one and a interpolated one, pr default
# denoted using "'" and """ delimiters.

$sillystring = 'a string that does not support $interpolation';
$cleverstring = "this $string is @interpolated!";

# Interpolated strings supports string operator detection
$string = 'intelligent';
print "\t\uin fact, these strings are \Uvery\E\n\t\U$string\E, \LRIGHT\E \lPal?\n";
# The above prints
# '	In fact, these strings are VERY
#  	INTELLIGENT, right pal?
# '

# Supported operators are
qq[\U\u\L\l\E\n\t\r\e\a\f];

# The variable detction within strings is nowhere near the
# quality of the main variable detection loop, because the
# current implementation of the syntax highlight system in Kate
# would require the entire variable detection ruleset repeated for
# each string delimiter set allowing for interpolation.
$hashref->{advanced}->{${$options}};
"$hashref->{advanced}->{${$options}}"; # Ooops...

# Just to make life more fun for syntax highlight file authors,
# both kinds of strings can use any \W delimiter, using the
# q// and qq//, where as allways any pair sets can be used as such.
# This file supports only paren-like paired delimiters, some more rarely
# (never?) used pairs are unsupported.
$test = qq{a @string};
$test = q$another string$;

$test = qq/$foobar is foo bar/; # interpolation - variable detection - is only
                                # supported for  paired delimiters so far :(
$test =  qq/a multiline interpolated $string using
     slashes as delimiter/; # Oops - multiline is only
                            # supported for paired delimiters yet :(

$test = "here is an \$escaped variable";
$test = q{this is a silly $thing too, but
it works with multiple lines};

$test = q{a construct may contain nested { paren sets } of the same type};

$test = qq{this is a $cool one:
an interpolated $string with multiple @lines, they are not
%indexed, and this is the $#last one\n};

$test = qq{this string contains a $hash->{$member} which should not break the it!};
$test = qq( a string with (nested) and \( escaped \) parens will work);

# escaping is only supported for paried delimiter strings (and "'" and """)
# due to the missing option to set a variable. Aditionally, the greedy
# regexps eats up to the LAST delim character on the line :(
$test = qq#this is a double quoted $string too, but \# even if it looks so #;
$test = q!escaping does not work, it is just the greedy regex\! !; ! # see :(

# Luckily, this is not a big problem, as most of us choose a delimiter that
# does not appear in the string.

# The lefthand operands of the => operator cannot be detected just now :(
%couple = (
    Tarzan => "me",  # Here, "Tarzan" is a string
    'Jane' => "you", # same kind of string
);

# backtick strings are left alone, they are rare, and most likely a command
# will look just fine anyways. (these days we use qx// anyways, right?)
$startedpid = ·    `utime`;

# A HERE document construct. I mark the starting label, but the ending
# label is left as I need a method for setting a var for this!
# This also means that I HAVE to let the content render as "normal text",
# (as opposed to "interpolated string", for which it is often used).
# I would probably chose default anyway, as they are used for code in some modules
# and may be nested, the CGI module contains a nice example. Anyway, I would
# like to be able to keywordify the ending label - and will when I can :-)
$something = << END_OF_IT;
if ($bar) {
  $num++;
}
# the next line ends assignment to something
END_OF_IT

# File handles gets keywordifyed inside angle brackets ONLY, and ONLY
# if they are all uppercase \w characters,  so I avoid a clash with HTML
# in HERE documents.
# (While we are waiting for a way to employ alien hl syntax...)
my $file = "$ENV{HOME}/.bash_history";

if (-e $file && -f _ && -r _ ) {
  my (%commands, @sortedcommands);
  open FILEHANDLE, $file or die qq{Eeew! Couldn't open "$file": $!\n};
  chomp, $commands{$_}++  while (<FILEHANDLE>);
  close FILEHANDLE or die qq{Spooky! couldn't close "$file": $!\n};

  @sortedcommands = sort { $commands{$a} <=> $commands{$b} } keys %commands;
  if (@sortedcommands) {
    print 'Your favourite command is: "',
      $_ = $sortedcommands[-1], qq[($commands{$_}" times so far)\n];
    print "\uhow come I don't see the word 'perl' in that?!\n" unless /perl/;
    print scalar(keys %commands),
      " commands in history! \uconsider a duplicate lines cleanup script :-)\n",
        if ($commands{$_} > 100);
  }
}
# hmm, that's a bit of real code...

package trouble;

=head1 DESCRIPTION

Here are some of the bugs that i didn't manage to solve

Allmost ALL the bugs can be be related to the limited capabilities
of the syntax highlight system provided, here's a list of what I
think would solve the remaining problems:

By far the worst problem is the allways-greedy matching of * and +
quanitfyers in regexps, causing single line string and PATTERN
constructs to break. This is probably also the easiest one to solve,
as adding that parameter to the RegExpr command should be easy.

The second is the lack of a nesting detector, which should be pretty
easy and cheap too.

The third is the missing opportunity for setting a variable from
within the xml file, which would make fully detecting ALL PATTERN
constructs a breeze :-). Should be possible, deciding on the right way
is the most difficult.

On the wishlist is the ability to look ahead, specificly in regexps
to decide based on the flags if we go string or code in the second
part of a s///ecgimosx construct, but that would most likely be
pretty expensive... so let's leave that.

=cut

# This big s/// is from Pod::Html, and shows a problem with
# the lack of a nest-detecting method in the hl implementation:
# The second part of the construct, which is evaluated as code
# breaks due to the braces in the code.
     $rest =~ s{
		 (<A\ HREF="?) ([^>:]*:)? ([^>:]*) \.pod: ([^>:]*:)?
               }{
                  my $url ;
                  if ( $htmlfileurl ne '' ){
		     # Here, we take advantage of the knowledge
		     # that $htmlfileurl ne '' implies $htmlroot eq ''.
		     # Since $htmlroot eq '', we need to prepend $htmldir
		     # on the fron of the link to get the absolute path
		     # of the link's target. We check for a leading '/'
		     # to avoid corrupting links that are #, file:, etc.
		     my $old_url = $3 ;
		     $old_url = "$htmldir$old_url" if $old_url =~ m{^\/};
 		     $url = relativize_url( "$old_url.html", $htmlfileurl );
	          } else {
		     $url = "$3" ;
		  }
		  "$1$url" ;
	       }xeg;
# As I only support s{}{} for multiline s/// constructs,
# you can't even do better with a different delimiter.
# The solution is changes to the hl implementation, where
# allowing to set variables from the XML file would make
# supporting all delimiters a resonable concept.

   # Problem statement from Pod::Text.
   # The problem is the missing nongreedy regex setting
   # in the hl system, which makes the stringrendering
   # extent the actual string, 'Revision: 2.3 '.
   # As the start of the s/// pattern is chewed up, and
   # the final "/" starts a /PATTERN/ in the rendering, which
   # is continued until another "/" is found. Eew.
($VERSION = (split (' ', q$Revision: 2.3 $ ))[1]) =~ s/\.(\d)$/.0$1/;
   #/# <- untrouble me
   # As you see below, the string is OK if it hasn't been the
   # repeated delimiter.
($VERSION = (split (' ', q$Revision: 2.3 $ ))[1]);
   split (' ', q$Revision: 2.3 $ ))[1];
   q$Revision: 2.3 $;
   # choosing another delimiter solves the problem, but this example
   # represents a clever way of building a $VERSION variable involving
   # reuse of the Revision string provided by RCS, which is quite
   # common. Anyway, here it is in a changed version:
($VERSION = (split (' ', q!Revision: 2.3 ! ))[1]) =~ s/\.(\d)$/.0$1/;

1; # (This file will not compile, as it contains a few illegal statements)
# Perl special __DATA__ filehandle reads the contents of the
# __DATA__ section. it is rendered as normal text, and only
# pod commands are detected.

__DATA__

Here all should be 1 rendering, no @vars, numbers 123 etc.

foo	100	John
bar	101	Eric
baz	102	Anthony

=head1 MINOR PROBLEM

If pod left in the __DATA__ or __END__ sections, we return to context 0,
default, so unless the next line is the __DATA__ or __END__ section, text
gets highlighted as code. That will probably never happen, so I didn't add
specific pod rules to theese contexts, maybe when the hl implementation
improves :-)

=cut

__END__

The perl code has now ENDed, and anything below this point is just some
characters, except for pod commands, which is all we look for.

Many module programmers chose to put the pod in the __END__ section, as it
saves the interpreter from discarding the lines, and the documentation in
many modules takes up more bytes than the code it self :-)

Here, $nothing should be rendered other than just the text in the configuration
chosen by the user - comment style by default. 123

=pod

This is just pod

=cut

_______________________________________________
kwrite-devel mailing list
kwrite-devel@mail.kde.org
http://mail.kde.org/mailman/listinfo/kwrite-devel

[prev in list] [next in list] [prev in thread] [next in thread]