antlr - Antlr rule priorities

Question

Firstly I know this grammar doesn't make sense but it was created to test out the ANTLR rule priority behaviour

grammar test;

options 
{

output=AST;
backtrack=true;
memoize=true;

}

rule_list_in_order :
    (
    first_rule
    | second_rule
    | any_left_over_tokens)+
    ;


first_rule
    :
     FIRST_TOKEN
    ;


second_rule:     
    FIRST_TOKEN NEW_LINE SECOND_TOKEN NEW_LINE;


any_left_over_tokens
    :
    NEW_LINE
    | FIRST_TOKEN
    | SECOND_TOKEN;



FIRST_TOKEN
    : 'First token here'
    ;   

SECOND_TOKEN
    : 'Second token here';

NEW_LINE
    : ('\r'?'\n')   ;

WS  : (' '|'\t'|'\u000C')
    {$channel=HIDDEN;}
    ;

When I give this grammar the input 'First token here\nSecond token here', it matches the second_rule.

I would have expected it to match the first rule then any_left_over_tokens because the first_rule appears before the second_rule in the rule_order_list which is the start point. Can anyone explain why this happens?

Cheers

score 19 · Accepted Answer

首先，ANTLR 的词法分析器会从上到下对输入进行分词。所以首先定义的标记比它下面的标记具有更高的优先级。如果规则有重叠的标记，匹配最多字符的规则将优先（贪婪匹配）。

同样的原则也适用于解析器规则。首先定义的规则也将首先匹配。例如，在 rule 中foo，sub-rulea将首先被尝试 before b：

foo
  :  a
  |  b
  ;

请注意，在您的情况下，^第二条规则不匹配，但尝试这样做，但由于没有尾随换行符而失败，从而产生错误：

line 0:-1 mismatched input '<EOF>' expecting NEW_LINE

所以，根本没有什么是匹配的。但这很奇怪。因为您已经设置了backtrack=true，所以它至少应该回溯并匹配：

first_rule （“这里的第一个令牌”）
any_left_over_tokens （“越线”）
any_left_over_tokens （“这里的第二个令牌”）

如果first_rule一开始不匹配，甚至不尝试匹配second_rule。

backtrack手动执行谓词（并在选项 { ... }部分中禁用）时的快速演示如下所示：

grammar T;

options {
  output=AST;
  //backtrack=true;
  memoize=true;
}

rule_list_in_order
  :  ( (first_rule)=>  first_rule  {System.out.println("first_rule=[" + $first_rule.text + "]");}
     | (second_rule)=> second_rule {System.out.println("second_rule=[" + $second_rule.text + "]");}
     | any_left_over_tokens        {System.out.println("any_left_over_tokens=[" + $any_left_over_tokens.text + "]");}
     )+ 
  ;

first_rule
  :  FIRST_TOKEN
  ;

second_rule
  :  FIRST_TOKEN NEW_LINE SECOND_TOKEN NEW_LINE
  ;

any_left_over_tokens
  :  NEW_LINE
  |  FIRST_TOKEN
  |  SECOND_TOKEN
  ;

FIRST_TOKEN  : 'First token here';   
SECOND_TOKEN : 'Second token here';
NEW_LINE     : ('\r'?'\n');
WS           : (' '|'\t'|'\u000C') {$channel=HIDDEN;};

可以用类测试：

import org.antlr.runtime.*;

public class Main {
    public static void main(String[] args) throws Exception {
        String source = "First token here\nSecond token here";
        ANTLRStringStream in = new ANTLRStringStream(source);
        TLexer lexer = new TLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        TParser parser = new TParser(tokens);
        parser.rule_list_in_order();
    }
}

产生预期的输出：

first_rule=[First token here]
any_left_over_tokens=[
]
any_left_over_tokens=[Second token here]

请注意，如果您使用以下内容并不重要：

rule_list_in_order
  :  ( (first_rule)=>  first_rule 
     | (second_rule)=> second_rule
     | any_left_over_tokens
     )+ 
  ;

或者

rule_list_in_order
  :  ( (second_rule)=> second_rule // <--+--- swapped
     | (first_rule)=>  first_rule  // <-/
     | any_left_over_tokens
     )+ 
  ;

，两者都会产生预期的输出。

所以，我的猜测是你可能发现了一个错误。

如果你想要一个明确的答案，你可以试试 ANTLR 邮件列表（Terence Parr 经常去那里比他在这里更频繁）。

祝你好运！

PS。我用 ANTLR v3.2 测试了这个

antlr - Antlr rule priorities

1 回答 1

Related

Reference