我在使用另一个正则表达式时遇到了一些麻烦。对于这个,我的代码应该寻找模式:
re.compile(r"kill(?:ed|ing|s)\D*(\d+).*?(?:men|women|children|people)?")
但是,它的匹配过于激进。它恰好匹配一个包含“杀死”这个词的句子。但是该模式会继续收集,直到它在文本中进一步向下达到一个数字。特别是,它匹配:
killed in an apparent u.s. drone attack on a car in yemen on sunday, tribal sources and local officials said.the men's car was driving through the south-eastern province of maareb, a mostly desert region where militants have taken refuge after being driven from southern strongholds.yemen, where al qaeda militants exploited a security vacuum during last year's uprising that ousted president ali abdullah saleh, has seen an in10
这不是我所追求的行为。如果在一个句子中找不到这种模式,我希望它失败。
我试图用伪代码实现的解决方案是:
find instance of 'kill'
if what follows contains a period (\.) before a digit, do not match.
我失败的实现如下所示:
re.compile(r"kill(?:ed|ing|s)\D*(?!:\..*?)(\d+).*?(?:men|women|children|people)?")
我尝试了“后视”,但我必须指定一个宽度。我试图用上面做的是匹配任何'kill'的结尾,然后是任何非数字,但不匹配一个句点,并且在我之后的数字之前可以自由跟随任何其他内容。
可悲的是,这段代码在我的测试中表现得完全一样。任何帮助,将不胜感激。