I have the following ANTLR 4 combined grammar:
grammar Example;
fieldList: field* ;
field: 'field' identifier '{' note '}' ;
note: NOTE ;
identifier: IDENTIFIER ;
NOTE: [A-Ga-g] ;
IDENTIFIER: [A-Za-z0-9]+ ;
WS: [ \t\r\n]+ -> skip ;
This parses:
field x { A }
field x { B }
This does not:
field a { A }
field b { B }
In the case where parsing fails, I think the lexer is getting confused and emitting a NOTE token where I want it to emit an IDENTIFIER token.
Edit:
In the tokens coming out of the lexer, the 'NOTE' token is showing up where the parser is expecting 'IDENTIFIER'. 'NOTE' has higher precedence because it's shown first in the grammar. So, I can think of two ways to fix this... first, I could alter the grammar to disambiguate 'NOTE' and 'IDENTIFIER' (like adding a '$' in front of 'NOTE'). Or, I could just use 'IDENTIFIER' where I would use note and then deal with detecting issues when I walk the parse tree. Neither of those feel optimal. Surely there must be a way to fix this?