python - 如何仅在关键字最后一次出现时才匹配表达式语法

Question

我想编写一个匹配字符串的表达式语法，如下所示：

words at the start ONE|ANOTHER wordAtTheEnd

---------^-------- ----^----- --^--
     A: alphas     B: choice  C: alphas

然而，问题是部分A可以包含来自部分B的关键字“ONE”或“ANOTHER” ，因此只有最后一次出现的选择关键字应该与部分B匹配。这里有一个例子：字符串

ZERO ONE or TWO are numbers ANOTHER letsendhere

应该被解析成字段

A: "ZERO ONE or TWO are numbers"
B: "ANOTHER"
C: "letsendhere"

pyparsing我尝试了表达式的“ ” -stopOn关键字：OneorMore

choice = pp.Or([pp.Keyword("ONE"), pp.Keyword("OTHER")])('B')
start = pp.OneOrMore(pp.Word(pp.alphas), stopOn=choice)('A')
end = pp.Word(pp.alphas)('C')
expr = (start + choice) + end

但这不起作用。对于示例字符串，我得到ParseException：

Expected end of text (at char 12), (line:1, col:13)
"ZERO ONE or >!<TWO are numbers ANOTHER text"

这是有道理的，因为在第一次出现而不是最后一次stopOn出现时停止。如何编写一个在最后一次出现时停止的语法？也许我需要求助于上下文相关的语法？choice

score 1 · Accepted Answer

有时您必须尝试“成为解析器”。“最后一次出现的 X”与其他 X 的区别是什么？一种说法是“一个不再有 X 的 X”。使用 pyparsing，您可以编写如下辅助方法：

def last_occurrence_of(expr):
    return expr + ~FollowedBy(SkipTo(expr))

这里它被用作 OneOrMore 的 stopOn 参数：

integer = Word(nums)
word = Word(alphas)
list_of_words_and_ints = OneOrMore(integer | word, stopOn=last_occurrence_of(integer)) + integer

print(list_of_words_and_ints.parseString("sldkfj 123 sdlkjff 123 lklj lkj 2344 234 lkj lkjj"))

印刷：

['sldkfj', '123', 'sdlkjff', '123', 'lklj', 'lkj', '2344', '234']

python - 如何仅在关键字最后一次出现时才匹配表达式语法

1 回答 1

Related

Reference