0

假设给出以下字符串:

stri = "Date 26 March 1256\nDate of death\n27 January 1756\n25 January 1567\n death"

现在我只想提取紧随其后的日期Date of death,即27 January 1756

我做到了这一点:

>>> regex = re.compile(r"Date of death.*?[0-9][0-9]? [A-z]+ [0-9]{4}", re.DOTALL)
>>> print regex.findall(stri)
['Date of death\n27 January 1756']

但我只想27 January 1756进行一次正则表达式搜索。

4

4 回答 4

4

您需要在要findall返回的匹配部分周围使用捕获组(括号):

>>> regex = re.compile(r"Date of death.*?([0-9][0-9]? [A-z]+ [0-9]{4})", re.DOTALL)
>>> print regex.findall(stri)
['27 January 1756']
于 2012-10-24T13:06:37.400 回答
2

改用lookbehind:

regex = re.compile(r"(?<=Date of death\n)[0-9][0-9]? [A-z]+ [0-9]{4}", re.DOTALL)

这将检查当前位置是否在前面Date of death\n而不实际包含在匹配中。

请注意,您现在不能使用.*?,因为大多数正则表达式引擎不支持可变长度的lookbehinds。

您还可以通过使用内置字符类来稍微缩短您的正则表达式\d

regex = re.compile(r"(?<=Date of death\n)\d{1,2} [A-z]+ \d{4}", re.DOTALL)
于 2012-10-24T13:07:11.697 回答
1

使用捕获组。

regex = re.compile(r"Date of death (.*?[0-9]{1,2} [A-z]+ [0-9]{4})", re.DOTALL)
于 2012-10-24T13:06:47.280 回答
1

这个怎么样:

In [64]: m=re.search("(?<=Date of death)\s+(\d+ \w+ \d+)",stri)

In [65]: m.groups()
Out[65]: ('27 January 1756',)

In [66]: m.groups()[0]
Out[66]: '27 January 1756'
于 2012-10-24T13:19:28.810 回答