我正在使用 Instaparse 解析表达式,例如:
$(foo bar baz $(frob))
变成类似的东西:
[:expr "foo" "bar" "baz" [:expr "frob"]]
我几乎得到它,但有歧义的麻烦。这是我的语法的简化版本,它试图依赖否定前瞻。
(def simple
(insta/parser
"expr = <dollar> <lparen> word (<space> word)* <rparen>
<word> = !(dollar lparen) #'.+' !(rparen)
<space> = #'\\s+'
<dollar> = <'$'>
<lparen> = <'('>
<rparen> = <')'>"))
(simple "$(foo bar)")
哪些错误:
Parse error at line 1, column 11:
$(foo bar)
^
Expected one of:
")"
#"\s+"
这里我说过一个词可以是任何字符,以支持如下表达式:
$(foo () `bar` b-a-z)
等等。注意一个词可以包含()
但不能包含$()
。不知道如何在语法中表达这一点。似乎问题是<word>
太贪婪了,消耗最后一个)
而不是让它expr
拥有它。
更新从单词中删除的空格:
(def simple2
(insta/parser
"expr = <dollar> <lparen> word (<space> word)* <rparen>
<word> = !(dollar lparen) #'[^ ]+' !(rparen)
<space> = #'\\s+'
<dollar> = <'$'>
<lparen> = <'('>
<rparen> = <')'>"))
(simple2 "$(foo bar)")
; Parse error at line 1, column 11:
; $(foo bar)
; ^
; Expected one of:
; ")"
; #"\s+"
(simple2 "$(foo () bar)")
; Parse error at line 1, column 14:
; $(foo () bar)
; ^
; Expected one of:
; ")"
; #"\s+"
再更新 2个测试用例
(simple2 "$(foo bar ())")
(simple2 "$((foo bar baz))")
更新 3完整的工作解析器
对于任何好奇的人,超出此问题范围的完整工作解析器是:
(def parse
"expr - the top-level expression made up of cmds and sub-exprs. When multiple
cmds are present, it implies they should be sucessively piped.
cmd - a single command consisting of words.
sub-expr - a backticked or $(..)-style sub-expression to be evaluated inline.
parened - a grouping of words wrapped in parenthesis, explicitly tokenized to
allow parenthesis in cmds and disambiguate between sub-expression
syntax."
(insta/parser
"expr = cmd (<space> <pipe> <space> cmd)*
cmd = words
<sub-expr> = <backtick> expr <backtick> | nestable-sub-expr
<nestable-sub-expr> = <dollar> <lparen> expr <rparen>
words = word (<space>* word)*
<word> = sub-expr | parened | word-chars
<word-chars> = #'[^ `$()|]+'
parened = lparen words rparen
<space> = #'[ ]+'
<pipe> = #'[|]'
<dollar> = <'$'>
<lparen> = '('
<rparen> = ')'
<backtick> = <'`'>"))
示例用法:
(parse "foo bar (qux) $(clj (map (partial * $(js 45 * 2)) (range 10))) `frob`")
解析为:
[:expr [:cmd [:words "foo" "bar" [:parened "(" [:words "qux"] ")"] [:expr [:cmd [:words "clj" [:parened "(" [:words "map" [:parened "(" [:words "partial" "*" [:expr [:cmd [:words "js" "45" "*" "2"]]]] ")"] [:parened "(" [:words "range" "10"] ")"]] ")"]]]] [:expr [:cmd [:words "frob"]]]]]]
这是我写的一个聊天机器人的解析器,yetibot。它取代了以前基于正则表达式的手动解析的混乱。