我想打开一个文件并获取句子。文件中的句子跨行,如下所示:
"He said, 'I'll pay you five pounds a week if I can have it on my own
terms.' I'm a poor woman, sir, and Mr. Warren earns little, and the
money meant much to me. He took out a ten-pound note, and he held it
out to me then and there.
目前我正在使用此代码:
text = ' '.join(file_to_open.readlines())
sentences = re.split(r' *[\.\?!][\'"\)\]]* *', text)
readlines
切穿句子,有没有解决这个问题的好方法,只得到句子?(没有 NLTK)
谢谢你的关注。
目前的问题:
file_to_read = 'test.txt'
with open(file_to_read) as f:
text = f.read()
import re
word_list = ['Mrs.', 'Mr.']
for i in word_list:
text = re.sub(i, i[:-1], text)
我得到的(在测试用例中)是 Mrs. 改为 Mr. 而 Mr. 只是 Mr. 我尝试了其他几件事,但似乎没有用。答案可能很简单,但我错过了