[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kwrite-devel
Subject:    Re: look-behind assertions in syntax HL?
From:       "Steven J. Long" <slong () rathaus ! eclipse ! co ! uk>
Date:       2014-02-14 8:49:43
Message-ID: 20140214090303.GB4479 () MLrathaus ! eclipse ! co ! uk
[Download RAW message or body]

On Wed, Feb 12, 2014 at 01:41:56PM -0500, Matthew Woehlke wrote:
> Are there any plans to add look-behind assertions to kate's syntax HL 
> engine? There are some bugs in at least the CMake and reST HL's that I 
> don't believe can be fixed in any way except by look-behind assertions.
> 
> In particular, problems arise when I need to start a context based on 
> preceding characters that may have already been consumed by some other 
> match rule. For example (CMake):
> 
>    foo(${a}"b")
> 
> The above is legal CMake, but while the '"' does in fact start a string, 
> it is *also* effectively escaped. Therefore I would like to have it 
> start a different context, but in order to do this I need a look-behind 
> to see if the previous character is '[^\s&quot;]'. I can't just write 
> this as a rule consuming the character, because it may have already been 
> consumed (as in the above example, by "Detect Variables").
> 
> There are similar problems with &inlinestart; in reST, which also should 
> be a look-behind assertion.
> 
> Obviously QRegExp does not support look-behind assertions, and anyway 
> due to the 'consumed by previous context' rule I'm not sure if a 
> look-behind in QRegExp would work anyway. Therefore what I would propose 
> is a new XML attribute (probably only for RegExpr) "lookBehind", which 
> would take a regex that would have '$' implicitly appended (probably 
> with an implicit rule that the regex may not contain a look-ahead 
> assertion). If present, after otherwise matching a RegExpr, kate would 
> take the text preceding the match and test if the look-behind also 
> matches before considering the rule a match.
> 
> Comments? Thoughts? Existing ways to accomplish the task?

I've done similar things in makefile2.xml, which was what i started with
and haven't tried to push, since I got lost with the stack bug, and a
newer version that seems pretty clean came out. It was complex stuff as
I was interested in having shell highlighting, and took inspiration
from bash.xml as so many of us have, and also wanted full gmake support.
Supporting $(shell cmd ..) made it even moar fun and fried my brain ;)

In essence, you do it in the check for terminator of the variable name,
in this example. ie do a Detect2Chars String "}&quot" and "}'" if they
both highlight eg as operator. Otherwise do a lookahead Regex
"\}(?=&quot)" and: "\}(?=') if you want to jump straight to specific
contexts (that consume the " or '), or combine if you go to one which
will match the quote and switch "\}(?=[&quot'])" The latter is
preferable, otherwise you have to match there and switch to a
sub-context; in this case it terminates with the start char, but even
when there's a pair eg of brackets, you have to consume/match the first
and then switch.

Whichever way, make *sure* the rule comes before the DetectChar/rule
for the terminator ("}" here), or that will take precedence and yours
won't have any effect. You won't be warned, it just won't do anything.

You can also get away with using one context if the type of quote
makes no difference to the escaping and highlighting, via the dynamic
attribute, on the regex and the context it goes to. Personally I find
it hard enough without using it, so someone else will have better
knowledge. It's essential for things like << SOMEWORD for EOF in
shell, though, where the state ends based on a user-defined token
(luckily at char 0 in this case.)

If whitespace is allowed in-between the } and " then you'll have to
use a regex. (odd choice if so.)

If you think about it, that's really what a lexer for the language
in question would have to do (lookahead on terminator), given that
there's different string escaping happening.
So: STATE "}"/[ \t]*["'] or: STATE "}"/["']  vs: STATE "}"

If it only applies within parentheses, you have my sympathy, but
just make another context; IncludeRules becomes essential. (I label
contexts that are only ever included with lower_case, and ones I jump
to with CamelCase, but have been known to resort to _foo and _Foo in
makefile.) I also leave off attribute and lineEndContext for
include contexts; the validator complains but kate doesn't, and it's
a lot simpler to read and distinguish, ie maintain, and less cruft.

Admittedly I only found about the correct validator last year, after
5 years of wresting with xml. The old XML Plugin complains about
that too, though: I just ignore those warnings, since kate has always
been fine about it. After all it doesn't use those attributes of
the Context when it includes its rules; just what the rules
themselves say.

I was actually kinda disappointed when i was exploring how things
worked: the lineEndContext would be insane but it would be nice to
take a default attribute sometimes. I won't try to recall the
detailed example, since i've gone on enough, but istr it was to
do with shell identifiers.

HTH,
igli.
-- 
#friendly-coders: We're still here for you™ ;-)
_______________________________________________
KWrite-Devel mailing list
KWrite-Devel@kde.org
https://mail.kde.org/mailman/listinfo/kwrite-devel

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic