'Re: [antlr-interest] simple grammar with wildcard'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       antlr-interest
Subject:    Re: [antlr-interest] simple grammar with wildcard
From:       jason zhang <jasonzhang2002 () gmail ! com>
Date:       2008-04-25 3:26:36
Message-ID: 48114F6C.5050507 () gmail ! com
[Download RAW message or body]

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
  <title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
Hi, all<br>
I got this solved. <br>
The grammar is like this<br>
----------------------------<br>
tokens<br>
{<br>
&nbsp;&nbsp;&nbsp; DQ='"';<br>
&nbsp;&nbsp;&nbsp; COLON=':';<br>
&nbsp;&nbsp;&nbsp; ATTRIBUTE;&nbsp;&nbsp;&nbsp; <br>
}<br>
<br>
program :&nbsp;&nbsp;&nbsp; attribute*;<br>
attribute : n=ID COLON ATTRVALUE -&gt;^(ATTRIBUTE ID ATTRVALUE);<br>
ID :LETTER (LETTER | '-')*;<br>
WS : ( '\t' | ' ' )+ &nbsp;&nbsp;&nbsp; { skip(); } ;<br>
LINEBREAK<br>
&nbsp;&nbsp;&nbsp; :&nbsp;&nbsp;&nbsp; '\r'?'\n' {skip();};<br>
ATTRVALUE<br>
&nbsp;&nbsp;&nbsp; :&nbsp; '"' ( ~('\\'|'"') )* '"'<br>
&nbsp;&nbsp;&nbsp; ;<br>
fragment <br>
LETTER :'a'..'z'|'A'..'Z'|'0'..'9'|'_';<br>
---------------------------<br>
I got the ATTRVALUE idea from example JAVA 5 grammar. JAVA 5 defines
STRING_LITERAL. ATTRVALUE is almost the same as STRING_LITERAL.<br>
<br>
thanks<br>
<br>
-jason<br>
<br>
<br>
<br>
<br>
jason zhang wrote:
<blockquote cite="mid:4810062C.8050303@gmail.com" type="cite">
  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
  <title></title>
Hi, Thomas, Hohannes<br>
I tried various options. This seems to be the only workable grammar<br>
-----------------------------------<br>
grammar Test;<br>
tokens<br>
{<br>
&nbsp;&nbsp;&nbsp; DQ='"';<br>
&nbsp;&nbsp;&nbsp; COLON=':';<br>
}<br>
  <br>
program : attribute*;<br>
attribute : n=ID COLON DQ .* DQ LINEBREAK{ System.out.println("match
attrname==============="+$n.text+" attrvalue=");};<br>
ID&nbsp;&nbsp;&nbsp; :&nbsp;&nbsp;&nbsp; ('a'..'z'|'A'..'Z'|'0'..'9'|'-'|'_')+;<br>
WS : ( '\t' | ' ' )+ &nbsp;&nbsp;&nbsp; { $channel=HIDDEN; } ;&nbsp;&nbsp;&nbsp; <br>
LINEBREAK<br>
&nbsp;&nbsp;&nbsp; :&nbsp;&nbsp;&nbsp; '\r'?'\n';<br>
-------------------------------- &nbsp; <br>
However, I could not capture the character matched by .*. If I change
.* to e=.*, or (e+=.)*, there will some error.<br>
  <br>
I guess I could construct a AST tree and walk the tree to retrieve the
matched value.&nbsp; Any suggestion?<br>
-jason<br>
  <br>
  <br>
Thomas Brandon wrote:
  <blockquote
 cite="mid:ebc876d70804231012y7808cdebqb69a7373a0f0488b@mail.gmail.com"
 type="cite">
    <pre wrap="">On Thu, Apr 24, 2008 at 2:53 AM, Johannes Luber <a
 moz-do-not-send="true" class="moz-txt-link-rfc2396E"
 href="mailto:jaluber@gmx.de">&lt;jaluber@gmx.de&gt;</a> wrote:
  </pre>
    <blockquote type="cite">
      <pre wrap="">jason zhang schrieb:


    </pre>
      <blockquote type="cite">
        <pre wrap="">Hi, cai
I removed the NONQUOTE definition and use wildcard.
------------------------------
grammar Test;

program : attribute*;
attribute : n=ID ':' '"' e=(.*)'"' LINEBREAK{ System.out.println("match
      </pre>
      </blockquote>
      <pre wrap="">attrname==============="+$n.text+" attrvalue="+$e.text);};
    </pre>
      <blockquote type="cite">
        <pre wrap="">ID    :    ('a'..'z'|'A'..'Z'|'0'..'9'|'-'|'_')+;
WS : ( '\t' | ' ' )+     { $channel=HIDDEN; } ;   LINEBREAK
   :    '\r'?'\n';
--------------------------------
When I test this grammar by running generated java code. The e is not set
      </pre>
      </blockquote>
      <pre wrap="">by the generated java code anywhere. I got a NullPointerException. How can I
capture the value (.*) ?
    </pre>
      <blockquote type="cite">
        <pre wrap="">thanks

-jason

      </pre>
      </blockquote>
      <pre wrap=""> You have encountered a known bug. One can't use "e=(...)" yet. Either you
can forgo the parentheses or you have to create a new subrule.

 Johannes

    </pre>
    </blockquote>
    <pre wrap=""><!---->Or you should be able to use "(e+=.)*". However this will generate a
list of tokens so I don't think e.text will work. I think it should
work with a subrule.
However that grammar won't do what you want. The wildcard in the
parser matches any token not any character so it won't match any
characters not matched by lexer rules. You could change the NONQUOTE
rule so it doesn't include ID or LINEBREAK. Or the first lexer rule
specified for a given character sequence will match, so if you go back
to the original grammar and change the ordering so that ATTRVALUE is
after ID and LINEBREAK then they should match first. Then in your
parser use (ID|ATTRVALUE|LINEBREAK).

Tom.

  </pre>
  </blockquote>
  <br>
</blockquote>
<br>
</body>
</html>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic