给定一个字符串:
c = 'A problem. She said: "I don\'t know about it."'
并尝试对其进行标记:
>>> for sindex,sentence in enumerate(sent_tokenize(c)):
... print str(sindex)+": "+sentence
...
0: A problem.
1: She said: "I don't know about it.
2: "
>>>
为什么 NLTK 将第 2 句的结尾引号放到自己的第 3 句中?有没有办法纠正这种行为?