设置... 2 句话代表感兴趣的案例:
text = "He lives in Nidarvoll and tonight i must reach a train to Oslo at 6 oclock. The system, called BusTUC is built upon the classical system CHAT-80 (Warren and Pereira, 1982). CHAT-80 was a state of the art natural language system that was impressive on its own merits."
t2 = "He lives in Nidarvoll and tonight i must reach a train to Oslo at 6 oclock. The system, called BusTUC is built upon the classical system CHAT-80 (Warren and Pereira, 1982) fgbhdr was a state of the art natural. CHAT-80 was a state of the art natural language system that was impressive on its own merits."
首先,在引用位于句尾的情况下进行匹配:
p1 = "\. (.*\([A-za-z]+ .* [0-9]+\)\.+?)"
当引文不在句末时匹配:
p2 = "\. (.*\([A-za-z]+ .* [0-9]+\)[^\.]+\.+?)"
将这两种情况与 `|' 结合起来 正则表达式运算符:
p_main = re.compile("\. (.*\([A-za-z]+ .* [0-9]+\)\.+?)"
"|\. (.*\([A-za-z]+ .* [0-9]+\)[^\.]+\.+?)")
跑步:
>>> print(re.findall(p_main, text))
[('The system, called BusTUC is built upon the classical system CHAT-80 (Warren and Pereira, 1982).', '')]
>>>print(re.findall(p_main, t2))
[('', 'The system, called BusTUC is built upon the classical system CHAT-80 (Warren and Pereira, 1982) fgbhdr was a state of the art natural.')]
在这两种情况下,您都会得到带有引用的句子。
一个很好的资源是 python 正则表达式文档和随附的 regex howto页面。
干杯