python - python re.findall 奇怪的行为

Question

>>> text =\
... """xyxyxy testmatch0
... xyxyxy testmatch1
... xyxyxy
... whyisthismatched1
... xyxyxy testmatch2
...  xyxyxy testmatch3
... xyxyxy
... whyisthismatched2
... """
>>> re.findall("^\s*xyxyxy\s+([a-z0-9]+).*$", text, re.MULTILINE)
[u'testmatch0', u'testmatch1', u'whyisthismatched1', u'testmatch2', u'testmatch3', u'whyisthismatched2']

所以我的期望是不匹配包含“whyisthismatched”的行。

Python re 文档说明如下：

（点。）在默认模式下，这匹配除换行符以外的任何字符。如果指定了 DOTALL 标志，则它匹配任何字符，包括换行符。

我的问题是这是否真的是预期的行为或错误。如果预计有人请解释为什么这些行匹配以及我应该如何修改我的模式以获得我期望的行为：

[u'testmatch0', u'testmatch1', u'testmatch2', u'testmatch3']

score 6 · Accepted Answer

\s就字符类而言，换行符也是空白。如果您只想匹配空格，则需要匹配[ ]：

>>> re.findall("^\s*xyxyxy[ ]+([a-z0-9]+).*$", text, re.MULTILINE)
[u'testmatch0', u'testmatch1', u'testmatch2', u'testmatch3']

python - python re.findall 奇怪的行为

1 回答 1

Related

Reference