scala - 解析器组合器： repsep 是否允许回溯？

Question

考虑这样的解析器示例：

object TestParser extends RegexParsers {
    override protected val whiteSpace = """[ \t]*""".r  

    def eol = """(\r?\n)+""".r
    def item = "[a-zA-Z][a-zA-Z0-9-]*".r
    def list = "items:" ~> rep1sep(item,",") 
    def constraints = "exclude:" ~> item

    def itemsDefinition = (rep1sep(list, eol) ~ repsep(constraints,eol))
}

如果我尝试解析这个输入（没有两行包含排除工作正常）：

items: item1, item2, item3, item3, item4
items: item2, item3, item3, item5, item4    
items: item4, item5, item6, item10      
items: item1, item2, item3
exclude: item1
exclude: item2

我收到以下错误：

[5.5] failure: `items:' expected but `e' found

       exclude: item1

       ^

问题很明显这一行：

def itemsDefinition = (rep1sep(list, eol) ~ repsep(constraints,eol))

不起作用的原因是什么。跟回溯有关系吗？我必须使用哪些替代方法才能使其发挥作用？

score 6 · Accepted Answer

你需要在你的列表和你的约束之间有一个 eol

(rep1sep(list, eol) <~ eol) ~ repsep(constraint,eol)

完成答案：

您的语法将 eol 指定为列表之间的分隔符，而不是终止符。它将接受第一个exclude出现在最后一个之后的输入item3（带有空格，但不是新行）。

在您的解析器到达不需要的之后eol，它会寻找items，然后找到excludes。这给出了显示的错误消息。然后，解析器确实回溯到前一个新行。它考虑了列表部分停在那里的可能性，并寻找排除项。但是如果找到一个 eol 代替。所以另一个可能的错误消息是"excludes expected, eol found"，在这种情况下会更有帮助

当语法中有选择，并且没有分支成功时，解析器返回最远位置的错误，这通常是正确的策略。假设您的语法允许 a"if"或 a "for"，并且输入是"if !!!"。在if分支上，错误将类似于"(" expected, "!" found. 在for分支上，消息将是"for expected, if found". 显然，来自if分支的消息出现在第二个令牌上，比来自for分支的消息在第一个令牌上要好得多，而且根本不相关。

关于分隔符/终止符的问题，您可以考虑：

分隔符（;帕斯卡）：repsep(item, separator)
终止符（;在 C 中）：rep(item <~ terminator)
灵活的：repsep(item, separator) <~ separator?

the last one would allow for a single separator after no items at all. If this is undesirable, maybe (rep1sep(item, separator) <~ separator?)?.

scala - 解析器组合器： repsep 是否允许回溯？

1 回答 1

Related

Reference