python - 意外的 pyparsing 行为

Question

我在尝试调试 pyparsing 代码时遇到了这种意外行为：

string1 = "this is a test string : that behaves as I expect\n"
string2 = "this string does not behave as I expect\n"

field = CharsNotIn(":\n")
line = field + ZeroOrMore(Literal(":") + field) + LineEnd()

print line.parseString(string1)
print line.parseString(string2)

这会产生以下输出：

['this is a test string ', ':', ' that behaves as I expect', '\n']
['this string does not behave as I expect']

由于某种原因，解析器能够在中提取行尾字符string1，但无法在中提取它string2。string2我什至无法理解如果它没有拿起行尾，它是如何产生匹配的。

这种行为似乎特别适用于行尾字符，因为使用行尾以外的字符似乎可以正常工作：

string1 = "this is a test string : that behaves as I expect*"
string2 = "this string also behaves as I expect*"

field = CharsNotIn(":*")
line = field + ZeroOrMore(Literal(":") + field) + Literal("*")

print line.parseString(string1)
print line.parseString(string2)

这会产生：

['this is a test string ', ':', ' that behaves as I expect', '*']
['this string also behaves as I expect', '*']

score 1 · Accepted Answer

打印行以查看它匹配的伪正则表达式。

>>> print line
{!W:(:
) [{":" !W:(:
)}]... LineEnd}

如果我理解这一点，它正在寻找非冒号非换行符，它在第一个换行符处停止（在你的示例 string2 中，占据整行），然后寻找冒号和更多单词，如果它们存在（他们不' t)，然后是换行符。我的猜测是换行实例以某种方式被删除，而不是你断言如果它不能匹配换行符它将不匹配字符串是错误的。

>>> print line.parseString('xyzyy')
['xyzyy']

这确实留下了为什么即使没有换行符它也匹配的问题......

python - 意外的 pyparsing 行为

1 回答 1

Related

Reference