编辑:我做了第一个版本,Eike 帮助我在上面做了很多改进。我现在被困在一个更具体的问题上,我将在下面描述。你可以看看历史上的原始问题
我正在使用 pyparsing 来解析一种用于从数据库请求特定数据的小语言。它具有众多关键字、运算符和数据类型以及布尔逻辑。
我正在尝试改进在用户出现语法错误时发送给用户的错误消息,因为当前的错误消息不是很有用。我设计了一个小例子,类似于我使用上述语言所做的,但要小得多:
#!/usr/bin/env python
from pyparsing import *
def validate_number(s, loc, tokens):
if int(tokens[0]) != 0:
raise ParseFatalException(s, loc, "number musth be 0")
def fail(s, loc, tokens):
raise ParseFatalException(s, loc, "Unknown token %s" % tokens[0])
def fail_value(s, loc, expr, err):
raise ParseFatalException(s, loc, "Wrong value")
number = Word(nums).setParseAction(validate_number).setFailAction(fail_value)
operator = Literal("=")
error = Word(alphas).setParseAction(fail)
rules = MatchFirst([
Literal('x') + operator + number,
])
rules = operatorPrecedence(rules | error , [
(Literal("and"), 2, opAssoc.RIGHT),
])
def try_parse(expression):
try:
rules.parseString(expression, parseAll=True)
except Exception as e:
msg = str(e)
print("%s: %s" % (msg, expression))
print(" " * (len("%s: " % msg) + (e.loc)) + "^^^")
所以基本上,我们可以用这种语言做的唯一的事情就是写一系列的x = 0
,连接在一起and
和括号。
现在,有些情况下,当and
和括号使用时,错误报告不是很好。考虑以下示例:
>>> try_parse("x = a and x = 0") # This one is actually good!
Wrong value (at char 4), (line:1, col:5): x = a and x = 0
^^^
>>> try_parse("x = 0 and x = a")
Expected end of text (at char 6), (line:1, col:1): x = 0 and x = a
^^^
>>> try_parse("x = 0 and (x = 0 and (x = 0 and (x = a)))")
Expected end of text (at char 6), (line:1, col:1): x = 0 and (x = 0 and (x = 0 and (x = a)))
^^^
>>> try_parse("x = 0 and (x = 0 and (x = 0 and (x = 0)))")
Expected end of text (at char 6), (line:1, col:1): x = 0 and (x = 0 and (x = 0 and (xxxxxxxx = 0)))
^^^
实际上,如果解析器无法解析(并且在这里解析很重要) a 之后的某些内容and
,它就不会再产生好的错误消息了:(
我的意思是parse,因为如果它可以解析 5 但解析操作中的“验证”失败,它仍然会产生一个很好的错误消息。但是,如果它无法解析有效数字(如a
)或有效关键字(如xxxxxx
),它将停止生成正确的错误消息。
任何想法?