您可以使用re.split()
:
re.split('[.!]', text)
这将拆分[...]
字符类中的任何字符:
>>> import re
>>> text = 'Hi, my name is Will, And i am from Canada. I have 2 pets. One is a dog and the other is a Zebra. Ahoi! Thanks.'
>>> re.split('[.!]', text)
['Hi, my name is Will, And i am from Canada', ' I have 2 pets', ' One is a dog and the other is a Zebra', ' Ahoi', ' Thanks', '']
您可以对拆分表达式进行分组,以将字符包含在输出中单独的列表元素中:
>>> re.split('([.!])', text)
['Hi, my name is Will, And i am from Canada', '.', ' I have 2 pets', '.', ' One is a dog and the other is a Zebra', '.', ' Ahoi', '!', ' Thanks', '.', '']
要保留句子中的标点符号,请re.findall()
改用:
>>> re.findall('[^.!]+?[.!]', text)
['Hi, my name is Will, And i am from Canada.', ' I have 2 pets.', ' One is a dog and the other is a Zebra.', ' Ahoi!', ' Thanks.']