使用下面的代码(不是我的),您可以确定 vader 词典将哪些词分类为正面、负面和中性:
import nltk
from nltk.tokenize import word_tokenize, RegexpTokenizer
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sentence = 'Again, human interaction needs to have resolutions. Your reps cannot BLAME the system and shrug off being able to help. Let alone blame the system and not know WHY the system makes indiscriminate decisions.'
tokenized_sentence = nltk.word_tokenize(sentence)
sid = SentimentIntensityAnalyzer()
pos_word_list=[]
neu_word_list=[]
neg_word_list=[]
for word in tokenized_sentence:
if (sid.polarity_scores(word)['compound']) >= 0.1:
pos_word_list.append(word)
elif (sid.polarity_scores(word)['compound']) <= -0.1:
neg_word_list.append(word)
else:
neu_word_list.append(word)
print('Positive:',pos_word_list)
print('Neutral:',neu_word_list)
print('Negative:',neg_word_list)
score = sid.polarity_scores(sentence)
print('\nScores:', score)
运行此代码会产生以下结果:
Positive: ['help']
Neutral: ['Again', ',', 'human', 'interaction', 'needs', 'to', 'have', 'resolutions', '.', 'Your', 'reps', 'can', 'not', 'the', 'system', 'and', 'shrug', 'off', 'being', 'able', 'to', '.', 'Let', 'the', 'system', 'and', 'not', 'know', 'WHY', 'the', 'system', 'makes', 'indiscriminate', 'decisions', '.']
Negative: ['BLAME', 'alone', 'blame']
然后我们可以去 vader .txt 文件,找到你的话被指定的分数。Blame 得分为 -1.4,单独得分为 -1.0,帮助得分为 +1.7。这应该会产生负分,但是在使用“责备”一词之前您有“不能”一词,这否定了该词的负面元素,而是将其转换为正面。尽管 Vader 很聪明,但它可以识别否定,但不能将其与句子的整体结构联系起来(大多数替代方法都是如此)。
至于 Vader 是如何工作的概述,它依赖于总结整个句子中各种单词的情感强度,从而产生一个总分。Vader 内置了一些细微的细微差别,以超越传统词袋方法的分类器,包括添加否定词和常用术语。关于单词-sentiment-scores,您会在此处找到详细说明。