0

我在使用另一个正则表达式时遇到了一些麻烦。对于这个,我的代码应该寻找模式:

re.compile(r"kill(?:ed|ing|s)\D*(\d+).*?(?:men|women|children|people)?")

但是,它的匹配过于激进。它恰好匹配一个包含“杀死”这个词的句子。但是该模式会继续收集,直到它在文本中进一步向下达到一个数字。特别是,它匹配:

killed in an apparent u.s. drone attack on a car in yemen on sunday, tribal sources and local officials said.the men's car was driving through the south-eastern province of maareb, a mostly desert region where militants have taken refuge after being driven from southern strongholds.yemen, where al qaeda militants exploited a security vacuum during last year's uprising that ousted president ali abdullah saleh, has seen an in10

这不是我所追求的行为。如果在一个句子中找不到这种模式,我希望它失败。

我试图用伪代码实现的解决方案是:

find instance of 'kill'
if what follows contains a period (\.) before a digit, do not match.

我失败的实现如下所示:

re.compile(r"kill(?:ed|ing|s)\D*(?!:\..*?)(\d+).*?(?:men|women|children|people)?")

我尝试了“后视”,但我必须指定一个宽度。我试图用上面做的是匹配任何'kill'的结尾,然后是任何非数字,但不匹配一个句点,并且在我之后的数字之前可以自由跟随任何其他内容。

可悲的是,这段代码在我的测试中表现得完全一样。任何帮助,将不胜感激。

4

1 回答 1

3

一个小修改:

r"kill(?:ed|ing|s)[^\d.]*(\d+)[^.]*?(?:men|women|children|people)?"

.基本上,我防止在 kill 和 men/women/etc 之间匹配句号。之后。

于 2012-11-06T02:04:47.240 回答