pyparsing - 使用 pyparsing 提取可变长度子字符串

Question

我试图让 pyparsing 从字符串中提取由可变数量的单词组成的子字符串。

以下几乎可以工作，但丢失了子字符串的最后一个单词：

text = "Joe F Bloggs is the author of this book."
author = OneOrMore(Word(alphas) + ~Literal("is the"))

print author.parseString(text)

输出：

['Joe', 'F']

我错过了什么？

PS：我知道我可以用正则表达式做到这一点，但特别想用 pyparsing 做到这一点，因为它需要适应已经使用 pyparsing 编写的大量工作。

score 1 · Accepted Answer

您的否定前瞻必须出现在实际作者词之前：

>>> author = OneOrMore(~Literal("is the") + Word(alphas))
>>> print author.parseString(text)
['Joe', 'F', 'Bloggs']

1 回答 1