我正在使用 PyParsing 为相当复杂的语法实现解析器。(如果我可以补充的话,使用起来真的很愉快!)
语法有点“动态”,因为它允许定义(各种)字母,而字母又定义了其他定义中允许的元素。举个例子:
alphabet: a b c
lists:
s1 = a b
s2 = b c x
在这里,alphabet
旨在定义定义中允许的元素lists
。例如,s1
将是有效的,但s2
包含一个无效的x
.
没有这种验证的简单 PyParsing 解析器可能如下所示:
from pyparsing import Literal, lineEnd, Word, alphanums,\
OneOrMore, Group, Suppress, dictOf
def fixedToken(literal):
return Suppress(Literal(literal))
Element = Word(alphanums)
Alphabet = Group(OneOrMore(~lineEnd + Element))
AlphaDef = fixedToken("alphabet:") + Alphabet
ListLine = OneOrMore(~lineEnd + Element)
Lists = dictOf(Word(alphanums) + fixedToken("="), ListLine)
Start = AlphaDef + fixedToken("lists:") + Lists
if __name__ == "__main__":
data = """
alphabet: a b c
lists:
s1 = a b
s2 = b c x
"""
res = Start.parseString(data)
for k, v in sorted(res.items()):
print k, "=", v
这将解析并给出输出:
Alphabet= set(['a', 'c', 'b'])
s1 = ['a', 'b']
s2 = ['b', 'c', 'x']
但是,我希望解析器为 引发 ParseException (或类似的)s2
,因为它包含无效的x
. 理想情况下,我希望能够对以下内容进行定义ListLine
:OneOrMore(oneOf(Alphabet))
- 但显然,这需要一些动态解释,只能在Alphabet
实际解析和组装后才能完成。
我发现的一种解决方案是将解析操作添加到 1. 记住字母和 2. 验证行:
# ...
Alphabet = Group(OneOrMore(~lineEnd + Element))
def alphaHold(toks):
alphaHold.alpha = set(*toks)
print "Alphabet=", alphaHold.alpha
Alphabet.addParseAction(alphaHold)
AlphaDef = fixedToken("alphabet:") + Alphabet
ListLine = OneOrMore(~lineEnd + Element)
def lineValidate(toks):
unknown = set(toks).difference(alphaHold.alpha)
if len(unknown):
msg= "Unknown element(s): {}".format(unknown)
print msg
raise ParseException(msg)
ListLine.addParseAction(lineValidate)
# ...
这几乎提供了所需的输出:
Alphabet= set(['a', 'c', 'b'])
Unknown element(s): set(['x'])
s1 = ['a', 'b']
但不幸的是,PyParsing 会捕获解析操作引发的异常,因此这种方法在技术上失败了。在我可能错过的 PyParsing 中是否有另一种方法可以实现这一点?