parsing - Overlapping Tokens in ANTLR 4

Question

I have the following ANTLR 4 combined grammar:

grammar Example;

fieldList:  field* ;

field:      'field' identifier '{' note '}' ;

note:       NOTE ;
identifier: IDENTIFIER ;

NOTE:       [A-Ga-g] ;
IDENTIFIER: [A-Za-z0-9]+ ;
WS:         [ \t\r\n]+ -> skip ;

This parses:

field x { A }
field x { B }

This does not:

field a { A }
field b { B }

In the case where parsing fails, I think the lexer is getting confused and emitting a NOTE token where I want it to emit an IDENTIFIER token.

Edit:

In the tokens coming out of the lexer, the 'NOTE' token is showing up where the parser is expecting 'IDENTIFIER'. 'NOTE' has higher precedence because it's shown first in the grammar. So, I can think of two ways to fix this... first, I could alter the grammar to disambiguate 'NOTE' and 'IDENTIFIER' (like adding a '$' in front of 'NOTE'). Or, I could just use 'IDENTIFIER' where I would use note and then deal with detecting issues when I walk the parse tree. Neither of those feel optimal. Surely there must be a way to fix this?

score 5 · Accepted Answer

我实际上最终像这样解决了它：

grammar Example;

fieldList:  field* ;

field:      'field' identifier '{' note '}' ;

note:       NOTE ;
identifier: IDENTIFIER | NOTE ;

NOTE:       [A-Ga-g] ;
IDENTIFIER: [A-Za-z0-9]+ ;
WS:         [ \t\r\n]+ -> skip ;

我的解析树最终仍然看起来像我想要的那样。

我正在开发的实际语法更复杂，基于这种方法的解决方法也是如此。但总的来说，这种方法似乎运作良好。

score 1 · Accepted Answer

对您的问题的快速而肮脏的解决方法可以是：更改IDENTIFIER为仅匹配NOTE. 然后你把它们放在一起identifier。

结果语法：

grammar Example;

fieldList:  field* ;

field:      'field' identifier '{' note '}' ;

note:       NOTE ;
identifier: (NOTE|IDENTIFIER_C)+ ;

NOTE:       [A-Ga-g] ;
IDENTIFIER_C: [H-Zh-z0-9] ;
WS:         [ \t\r\n]+ -> skip ;

此解决方案的缺点是，您不会将标识符作为标记获取，而是对每个字符进行标记。

parsing - Overlapping Tokens in ANTLR 4

2 回答 2

Related

Reference