grammar - 无法在树保姆中编码块规则优先于语句规则的优先级

Question

我正在尝试对简单的语法进行编码，该语法既涵盖了普通语句，也涵盖了用块括起来的语句。块有它的特殊关键字。我已将块规则优先级指定为零，但 tree-sitter 仍然不匹配它。即使它报告错误，即其他规则不匹配。但尽管如此，它不想匹配块！为什么以及如何解决？

编码：

area = pi*r^2;

block {
    r=12;
}

tree-sitter将整个序列block { r=12;作为语句匹配，尽管在语句中不允许使用大括号。所以它报错，但不想匹配块规则，虽然它是适用的。

语法：

module.exports = grammar({
    name: 'test',

    rules: {
        source_file: $ => seq(
            repeat(choice($.block, $.statement_with_semicolon)),
            optional($.statement_without_semicolon)
        ),

        block: $ => prec(1, seq(
            "block",
            "{",
            repeat( $.statement_with_semicolon ),
            optional( $.statement_without_semicolon),
            "}",
            optional(";")
        )),

        statement_without_semicolon: $ => $.token_chain,

        statement_with_semicolon: $ => seq(
            $.token_chain,
            ";"
        ),

        token_chain: $ => repeat1(
            $.token
        ),

        token: $ => choice(
            $.alphanumeric,
            $.punctuation
        ),

        alphanumeric: $ => /[a-zA-Zα-ωΑ-Ωа-яА-Я0-9]+/,

        punctuation: $ => /[^a-zA-Zα-ωΑ-Ωа-яА-Я0-9"{}\(\)\[\];]+/
    }
});

输出：

>tree-sitter parse example-file
(source_file [0, 0] - [4, 1]
  (statement_with_semicolon [0, 0] - [0, 14]
    (token_chain [0, 0] - [0, 13]
      (token [0, 0] - [0, 4]
        (alphanumeric [0, 0] - [0, 4]))
      (token [0, 4] - [0, 7]
        (punctuation [0, 4] - [0, 7]))
      (token [0, 7] - [0, 9]
        (alphanumeric [0, 7] - [0, 9]))
      (token [0, 9] - [0, 10]
        (punctuation [0, 9] - [0, 10]))
      (token [0, 10] - [0, 11]
        (alphanumeric [0, 10] - [0, 11]))
      (token [0, 11] - [0, 12]
        (punctuation [0, 11] - [0, 12]))
      (token [0, 12] - [0, 13]
        (alphanumeric [0, 12] - [0, 13]))))
  (statement_with_semicolon [0, 14] - [3, 9]
    (token_chain [0, 14] - [3, 8]
      (token [0, 14] - [2, 0]
        (punctuation [0, 14] - [2, 0]))
      (token [2, 0] - [2, 5]
        (alphanumeric [2, 0] - [2, 5]))
      (token [2, 5] - [2, 6]
        (punctuation [2, 5] - [2, 6]))
      (ERROR [2, 6] - [2, 7])
      (token [2, 7] - [3, 4]
        (punctuation [2, 7] - [3, 4]))
      (token [3, 4] - [3, 5]
        (alphanumeric [3, 4] - [3, 5]))
      (token [3, 5] - [3, 6]
        (punctuation [3, 5] - [3, 6]))
      (token [3, 6] - [3, 8]
        (alphanumeric [3, 6] - [3, 8]))))
  (statement_without_semicolon [3, 9] - [4, 0]
    (token_chain [3, 9] - [4, 0]
      (token [3, 9] - [4, 0]
        (punctuation [3, 9] - [4, 0]))))
  (ERROR [4, 0] - [4, 1]))
example-file    0 ms    (ERROR [2, 6] - [2, 7])

score 1 · Accepted Answer

您的问题是您的punctuation正则表达式匹配换行符\n和\r，您可以在此处看到：

  (statement_with_semicolon [0, 14] - [3, 9]
    (token_chain [0, 14] - [3, 8]
      (punctuation [0, 14] - [2, 0]))

看看它如何匹配第零行的结尾和空白的第一行？当解析器开始block认为 block 只是statement_with_semicolonmatch中的另一个标记时alphanumeric。您可以通过将punctuation定义更改为：

punctuation: $ => /[^a-zA-Zα-ωΑ-Ωа-яА-Я0-9"{}\(\)\[\];\n\r]+/

但是，这可能不会是您遇到的最后一期此类问题，因此您可能需要重写语法以更明确地了解它接受的标点符号和位置。例如，定义一组有效的运算符。

这也回答了你的另一个问题。

grammar - 无法在树保姆中编码块规则优先于语句规则的优先级

1 回答 1

Related

Reference