我正在使用Treesitter来解析 Clojure 代码。具体来说,我想区分符号、类名和Java Interop。
这是我的语法:
module.exports = grammar({
name: 'clojure',
extras: $ => [/[\s,]/],
rules: {
program: $ => repeat($._anything),
_anything: $ => choice($.symbol, $.classname, $.member_access, $.new_class),
symbol: $ => $._symbol_chars,
classname: $ => prec.left(3, seq($._symbol_chars, repeat1($._classname_part ))),
_classname_part: $ => prec.right(3, seq($._dot, $._symbol_chars)),
member_access: $ => seq($._dot, $._class_chars),
new_class: $ => prec(2, seq( choice($.symbol, $.classname), $._dot)),
_dot: $ => /\.{1}/,
_symbol_chars: $ => /[a-zA-Z\*\+\!\-_\?][\w\*\+\!\-\?\':]*/,
_class_chars: $ => /[a-zA-Z_]\w*/
}
})
我希望
foo
java.lang.String
.toUpperCase
java.awt.Point.
被解析为
(program (
(symbol)
(classname)
(member_access)
(new_class (classname)))
但是 Treesitter 一直在看(new_class (classname)) (classname)
而不是(classname)
for Java.lang.String
。我想我需要某种贪婪的匹配,并且 prec.right()
在不同的地方尝试过但无济于事。我错过了什么?