我正在研究一本书的情绪分析项目。我正在使用 nltk.vader.sentimentintensityanalyzer 来记录哈利波特系列中段落的情感极性。
要创建段落并删除我所做的换行符:
text_file = open('HP1 Sorcerer of Stone.txt', 'r')
text = str(text_file.readlines())
text.replace('\\n"', "").replace("\'", "").replace(" , ","")
这将本书分成几段。当谈到对话时,问题就出现了。
对话在每个角色的单词之间有相同的段落中断
' "So?" snapped Mrs. Dursley. ',
' "Well, I just thought... maybe... it was something to do with... you
know... her crowd." ',
' Mrs. Dursley sipped her tea through pursed lips. Mr. Dursley wondered
whether he dared tell her he\\d heard the name "Potter." He decided he
didn\\t dare. Instead he said, as casually as he could, "Their son --
he\\d be about Dudley\\s age now, wouldn\\t he?" ',
' "I suppose so," said Mrs. Dursley stiffly. ',
' "What\\s his name again? Howard, isn\\t it?" ',
' "Harry. Nasty, common name, if you ask me." ',
如何编辑我的分解方法,使对话作为一个元素保持在一起?然后整个对话将用作强度分析器的单个输入。