python - 如何在正则表达式中包含特殊部分以使用 python re.findall 查找英文名称？

Question

我有一个如下的 python 代码来搜索所有的英文名称：

a = "Bonds met Susann (&quot;Sun&quot;) Margreth Branco, the mother of his first two children, in {{city-state|Montreal|Quebec}} in August 1987. They eloped in {{city-state|Las Vegas|Nevada}} Barry Bonds"

re.findall("(?:[A-Z][a-z'.]+\s*){1,4}",a)

我希望它返回：

['Bonds', 'Susann (&quot;Sun&quot;) Margreth Branco', 'Montreal', 'Quebec', 'August', 'They', 'Las Vegas','Nevada','Barry Bonds']

我的代码无法得到我想要的，如何修改正则表达式来实现我的目标？

我想补充一点，我使用了另一个正则表达式，(?:(([A-Z][a-z'.]+)|(\(&quot.*"\)))\s*){1,4}. 我在regexpal.com上对其进行了测试，它在该测试网站上找到了我想要的东西，但是在 Python 中，它只是不返回我想要的东西，而是分别返回我Susanand("Sun") Margreth和Branco三个，但我想要Susan ("Sun") Margreth Branco在我的结果中

score 1 · Accepted Answer

正如您所提到的，带有“&quto”的字符串也看起来像分隔符：

re.findall("[A-Z][a-z]*(?:(?:\\S*&quot\\S*|\\s)+[A-Z][a-z]*){0,3}", "Bonds met Susann (&quot;Sun&quot;) Margreth Branco, the mother of his first two children, in {{city-state|Montreal|Quebec}} in August 1987. They eloped in {{city-state|Las Vegas|Nevada}} Barry Bonds")

输出：

['Bonds', 'Susann (&quot;Sun&quot;) Margreth Branco', 'Montreal', 'Quebec', 'August', 'They', 'Las Vegas', 'Nevada', 'Barry Bonds']

python - 如何在正则表达式中包含特殊部分以使用 python re.findall 查找英文名称？

1 回答 1

Related

Reference