我是编程的新手,我正在从一本书和 Stack Overflow 中自学。我正在尝试删除聊天语料库中 \n 的多个实例,然后对句子进行标记。如果我不删除 \n,则字符串如下所示:
['answers for 10-19-20sUser139 ... hi 10-19-20sUser101 ;)\n\n\n\n\n\n\n\n\n\nI like it when you do it, 10-19-20sUser83\n\n\n\n\n\n\n\n\n\n\n\niamahotnipwithpics\n\n\n\n10-19-20sUser20 go plan the wedding!']
我尝试了几种不同的方法,例如 chomps、line、rstrip 等,但它们似乎都不起作用。可能是我用错了。整个代码如下所示:
import nltk, re, pprint
from nltk.corpus import nps_chat
chat= nltk.Text(nps_chat.words())
from nltk.corpus import NPSChatCorpusReader
from bs4 import BeautifulSoup
chat=nltk.corpus.nps_chat.raw()
soup= BeautifulSoup(chat)
soup.get_text()
text =soup.get_text()
print(text[:40])
print(len(text))
from nltk.tokenize import sent_tokenize
sent_chat = sent_tokenize(text)
len(sent_chat)
text[:] = [line.rstrip('\n') for line in text]
print(len(sent_chat))
print(sent_chat[:40])
当我使用 line 方法时,我收到此错误:
Traceback (most recent call last):
File "C:\Python34\Lib\idlelib\testsubjects\sentencelen.py", line 57, in <module>
text[:] = [line.rstrip('\n') for line in text]
TypeError: 'str' object does not support item assignment
帮助?