我正在使用 Antlr4,这是我编写的简化语法:
grammar BooleanExpression;
/*******************************
* Parser Rules
*******************************/
booleanTerm
: booleanLiteral (KW_OR booleanLiteral)+
| booleanLiteral
;
id
: IDENTIFIER
;
booleanLiteral
: KW_TRUE
| KW_FALSE
;
/*******************************
* Lexer Rules
*******************************/
KW_TRUE
: 'true'
;
KW_FALSE
: 'false'
;
KW_OR
: 'or'
;
IDENTIFIER
: (SIMPLE_LATIN)+
;
fragment
SIMPLE_LATIN
: 'A' .. 'Z'
| 'a' .. 'z'
;
WHITESPACE
: [ \t\n\r]+ -> skip
;
我使用了 BailErrorStategy 和 BailLexer,如下所示:
public class BailErrorStrategy extends DefaultErrorStrategy {
/**
* Instead of recovering from exception e, rethrow it wrapped in a generic
* IllegalArgumentException so it is not caught by the rule function catches.
* Exception e is the "cause" of the IllegalArgumentException.
*/
@Override
public void recover(Parser recognizer, RecognitionException e) {
throw new IllegalArgumentException(e);
}
/**
* Make sure we don't attempt to recover inline; if the parser successfully
* recovers, it won't throw an exception.
*/
@Override
public Token recoverInline(Parser recognizer) throws RecognitionException {
throw new IllegalArgumentException(new InputMismatchException(recognizer));
}
/** Make sure we don't attempt to recover from problems in subrules. */
@Override
public void sync(Parser recognizer) {
}
@Override
protected Token getMissingSymbol(Parser recognizer) {
throw new IllegalArgumentException(new InputMismatchException(recognizer));
}
}
public class BailLexer extends BooleanExpressionLexer {
public BailLexer(CharStream input) {
super(input);
//removeErrorListeners();
//addErrorListener(new ConsoleErrorListener());
}
@Override
public void recover(LexerNoViableAltException e) {
throw new IllegalArgumentException(e); // Bail out
}
@Override
public void recover(RecognitionException re) {
throw new IllegalArgumentException(re); // Bail out
}
}
除了一种情况外,一切正常。我尝试了以下表达式:
true OR false
我希望这个表达式被拒绝并抛出 IllegalArgumentException 因为“或”标记应该是小写而不是大写。但事实证明 Antlr4 并没有拒绝这个表达式,并且该表达式被标记为“KW_TRUE IDENTIFIER KW_FALSE”(这是预期的,大写的“OR”将被视为一个 IDENTIFIER),但是解析器在执行过程中没有抛出错误处理此令牌流并将其解析为仅包含“true”的树并丢弃剩余的“IDENTIFIER KW_FALSE”令牌。我尝试了不同的预测模式,但它们都像上面一样工作。我不知道为什么它会这样工作并进行了一些调试,最终导致了 Antlr 中的这段代码:
ATNConfigSet reach = computeReachSet(previous, t, false);
if ( reach==null ) {
// if any configs in previous dipped into outer context, that
// means that input up to t actually finished entry rule
// at least for SLL decision. Full LL doesn't dip into outer
// so don't need special case.
// We will get an error no matter what so delay until after
// decision; better error message. Also, no reachable target
// ATN states in SLL implies LL will also get nowhere.
// If conflict in states that dip out, choose min since we
// will get error no matter what.
int alt = getAltThatFinishedDecisionEntryRule(previousD.configs);
if ( alt!=ATN.INVALID_ALT_NUMBER ) {
// return w/o altering DFA
return alt;
}
throw noViableAlt(input, outerContext, previous, startIndex);
}
代码“int alt = getAltThatFinishedDecisionEntryRule(previousD.configs);” 返回 booleanTerm 中的第二种选择(因为“true”与第二种选择“booleanLiteral”匹配)但由于它不等于 ATN.INVALID_ALT_NUMBER,因此不会立即抛出 noViableAlt。根据那里的Java评论,“无论如何我们都会得到一个错误,所以延迟到决定之后”但似乎最终没有抛出错误。
我真的不知道如何让 Antlr 在这种情况下报告错误,有人可以帮我解释一下吗?任何帮助表示赞赏,谢谢。