python - 正则表达式日期匹配：“Date of death\n27 January 1756”中的“27 January 1756”而不返回字符串的“Date of death”部分

Question

假设给出以下字符串：

stri = "Date 26 March 1256\nDate of death\n27 January 1756\n25 January 1567\n death"

现在我只想提取紧随其后的日期Date of death，即27 January 1756。

我做到了这一点：

>>> regex = re.compile(r"Date of death.*?[0-9][0-9]? [A-z]+ [0-9]{4}", re.DOTALL)
>>> print regex.findall(stri)
['Date of death\n27 January 1756']

但我只想27 January 1756进行一次正则表达式搜索。

score 4 · Accepted Answer

您需要在要findall返回的匹配部分周围使用捕获组（括号）：

>>> regex = re.compile(r"Date of death.*?([0-9][0-9]? [A-z]+ [0-9]{4})", re.DOTALL)
>>> print regex.findall(stri)
['27 January 1756']

score 2 · Accepted Answer

改用lookbehind：

regex = re.compile(r"(?<=Date of death\n)[0-9][0-9]? [A-z]+ [0-9]{4}", re.DOTALL)

这将检查当前位置是否在前面Date of death\n而不实际包含在匹配中。

请注意，您现在不能使用.*?，因为大多数正则表达式引擎不支持可变长度的lookbehinds。

您还可以通过使用内置字符类来稍微缩短您的正则表达式\d：

regex = re.compile(r"(?<=Date of death\n)\d{1,2} [A-z]+ \d{4}", re.DOTALL)

score 1 · Accepted Answer

1

使用捕获组。

regex = re.compile(r"Date of death (.*?[0-9]{1,2} [A-z]+ [0-9]{4})", re.DOTALL)

于 2012-10-24T13:06:47.280 回答

score 1 · Accepted Answer

这个怎么样：

In [64]: m=re.search("(?<=Date of death)\s+(\d+ \w+ \d+)",stri)

In [65]: m.groups()
Out[65]: ('27 January 1756',)

In [66]: m.groups()[0]
Out[66]: '27 January 1756'

python - 正则表达式日期匹配：“Date of death\n27 January 1756”中的“27 January 1756”而不返回字符串的“Date of death”部分

4 回答 4

Related

Reference