我正在努力使用正则表达式来提取句号之间的部分字符串,但如果句号前面有特定字母(例如示例中的 CO.),则忽略句号。我们可以假设相关块总是以“LTD”结尾。
情况1:
string = "FREDS CHIP SHOP. S & B SERVICES CO. & SONS LTD. 1-12 THE STREET"
我想要:"S & B SERVICES CO. & SONS LTD."
案例二:
string = "SOME TEXT. BUSINESS NAME LTD. 1-12 THE STREET"
我想"BUSINESS NAME LTD."
案例3:
string = "SIMPLE BUSINESS NAME LTD. 1-12 THE STREET"
我想"SIMPLE BUSINESS NAME LTD."
我目前有:
#!/usr/bin/python
import sys
import re
vnumber_name = "FREDS CHIP SHOP. S & B SERVICES CO. & SONS LTD. 1-12 THE STREET"
#vnumber_name = "SOME TEXT. BUSINESS NAME LTD. 1-12 THE STREET"
#vnumber_name = "SIMPLE BUSINESS NAME LTD. 1-12 THE STREET"
def test(vnumber_name):
#ltd = re.search(r'.+\sLTD[.]?', vnumber_name)
ltd = re.search(r'[.?][\s]{1,2}(?:[^.]+|(?!CO.))LTD[.]?', vnumber_name)
if ltd:
print "got it: " + ltd.group(0)
else:
print "nothing"
test(vnumber_name)
这是不对的。
我可以创建一些 if 子句,但是用一行正则表达式来获得它会很棒。