让我们来看看你所有的 4 个例子:
1个“VARA”
一切都好。
2“瓦尔瓦”
"VAR"
is (obviously) tokenized as VAR
, but then the lexer "sees" "VA"
and expects an "R"
, which is not there. It emits the following errors:
line 1:5 mismatched character '<EOF>' expecting 'R'
line 1:5 required (...)+ loop did not match anything at input '<EOF>'
and discards the "VA"
resulting in a single token to be created, as you can see when running ANTLRWorks' debugger (ignore the exceptions in the parse, they're not actually there :)):
The thing you must realize is that the lexer will never give up on something it has already matched. So if the lexer sees "VA"
and cannot match an "R"
after it, it will then look at the other lexer rules that can match "VA"
. But Letter
does not match that (it only matches single letters!) If you change Letter
to match more than a single character, ANTLR would be able to fall back on that rule. But not when it matches a single letter: the lexer will not give up the "A"
from "VA"
to let the Letter
rule match. No way around it: this is how ANTLR's lexer works.
This is usually not an issue because there is often some sort of IDENTIFIER
rule that the lexer can fall back on when a keyword cannot be matched.
3 "VARVPP"
All okay: "VAR"
becomes a VAR
and then the lexer tries to match an "A"
after the "V"
but this does not happen, so the lexer falls back on the Letter
rule for the single "V"
. After that "PP"
are both tokenized as Letter
s.
4 "VARVALL"
"VAR"
again becomes a VAR
. Then the "L"
in "VAL"
causes the lexer to produce the following error message:
line 1:5 mismatched character 'L' expecting 'R'
and then the last "L"
becomes a Letter
:
I guess (or hope) the first 3 question are now answered, which leaves your final answer:
How should I change this grammar to parse the way I expected?
By forcing the lexer to first look ahead in the character stream if there really is "VAR"
ahead, and if there's not, just match a single "V"
and change the type of the matched token to Letter
, like this:
Declaration
: ('VAR')=> 'VAR'
| 'V' {$type=Letter;}
;
As mentioned before my answer, see this related Q&A: ANTLR lexer can't lookahead at all