[prev in list] [next in list] [prev in thread] [next in thread] 

List:       antlr-interest
Subject:    [antlr-interest] White spaces within token definition
From:       "Haralambi Haralambiev" <hharalambiev () gmail ! com>
Date:       2008-04-25 11:57:21
Message-ID: aa71f710804250457w6a25787elce1569ac3a3eba3e () mail ! gmail ! com
[Download RAW message or body]

Hello,

I have stumbled upon a problem, that although has some workarounds, has
puzzled me over why it is happening.
(I searched for a similar question, but was unable to find it. I am sorry if
this has been answered somewhere else. If so, please provide me the link.)

Consider the following lexer grammar:
---------------------------------------------------
lexer grammar test;

CMD_EXIT : 'COMMAND EXIT';
ID : ('A'..'Z'|'a'..'z')+;
WhiteSpaces : (' '|'\t')+ {$channel=HIDDEN;};
---------------------------------------------------

Consider that the language that is recognized has many commands with the
syntax "COMMAND <name of the command>", but I am interested only in the exit
command, so I consider "COMMAND EXIT" as a token.
However, I would like
"COMMAND <something else>" to be matched as the sequence of two ID tokens.

With the grammar above, the "COMMAND EXIT" is successfully matched as
a CMD_EXIT token, however "COMMAND XYZ" produces an error "line
1:8 mismatched character 'X' expecting
'E'" and what is left (only the character Z) is matched as ID.

In the generated lexer class, in the mTokes() method I noticed that
the lexer will consider everything that starts with "COMMAND " as the
CMD_EXIT
token.
It just doesn't consider the characters in the token definition, that
were after the white space (i.e. 'E', 'X', 'I' and 'T') during the
recognition.

So, if you could enlighten me on why is this happening, I will be very
grateful!

Best Regards,
Hari

[Attachment #3 (text/html)]

Hello,<br><br>I have stumbled upon a problem, that although has some workarounds, has \
puzzled me over why it is happening.<br>(I searched for a similar question, but was \
unable to find it. I am sorry if this has been answered somewhere else. If so, please \
provide me the link.)<br> <br>Consider the following&nbsp;lexer \
grammar:<br>---------------------------------------------------<br>lexer grammar \
test;<br><br>CMD_EXIT	:	&#39;COMMAND \
EXIT&#39;;<br>ID		:	(&#39;A&#39;..&#39;Z&#39;|&#39;a&#39;..&#39;z&#39;)+;<br> \
WhiteSpaces	:	(&#39; &#39;|&#39;\t&#39;)+ \
{$channel=HIDDEN;};<br>---------------------------------------------------<br><br>Consider \
that the language that is recognized has many commands with the syntax &quot;COMMAND \
&lt;name of the command&gt;&quot;, but I&nbsp;am interested only in the exit command, \
so I consider &quot;COMMAND EXIT&quot; as a token.<br> However, I would like \
&quot;COMMAND&nbsp;&lt;something&nbsp;else&gt;&quot;&nbsp;to&nbsp;be&nbsp;matched&nbsp;as&nbsp;the&nbsp;sequence&nbsp;of&nbsp;two&nbsp;ID&nbsp;tokens.<br> \
<br>With&nbsp;the&nbsp;grammar&nbsp;above,&nbsp;the&nbsp;&quot;COMMAND&nbsp;EXIT&quot; \
&nbsp;is&nbsp;successfully&nbsp;matched&nbsp;as&nbsp;a&nbsp;CMD_EXIT&nbsp;token,&nbsp; \
however&nbsp;&quot;COMMAND&nbsp;XYZ&quot;&nbsp;produces&nbsp;an&nbsp;error&nbsp;&quot;line \
1:8 mismatched character &#39;X&#39; expecting \
&#39;E&#39;&quot;&nbsp;and&nbsp;what&nbsp;is&nbsp;left&nbsp;(only&nbsp;the&nbsp;character&nbsp;Z)&nbsp;is&nbsp;matched&nbsp;as&nbsp;ID.<br>
 <br>In&nbsp;the&nbsp;generated&nbsp;lexer&nbsp;class,&nbsp;in&nbsp;the&nbsp;mTokes()& \
nbsp;method&nbsp;I&nbsp;noticed&nbsp;that&nbsp;the&nbsp;lexer&nbsp;will&nbsp;consider& \
nbsp;everything&nbsp;that&nbsp;starts&nbsp;with&nbsp;&quot;COMMAND&nbsp;&quot;&nbsp;as&nbsp;the&nbsp;CMD_EXIT \
token. It&nbsp;just&nbsp;doesn&#39;t&nbsp;consider&nbsp;the&nbsp;characters&nbsp;in&nb \
sp;the&nbsp;token&nbsp;definition,&nbsp;that&nbsp;were&nbsp;after&nbsp;the&nbsp;white& \
nbsp;space&nbsp;(i.e.&nbsp;&#39;E&#39;,&nbsp;&#39;X&#39;,&nbsp;&#39;I&#39;&nbsp;and&nbsp;&#39;T&#39;)&nbsp;during&nbsp;the&nbsp;recognition.<br>
  <br>So, if you could enlighten me on why is this happening, I will be very \
grateful!<br><br>Best Regards,<br>Hari<br>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic