python - NLTK 的 Vader 评分文本示例

Question

我希望有人纠正我对 VADER 如何对文本进行评分的理解。我在这里阅读了这个过程的解释，但是在重新创建它描述的过程时，我无法将测试句子的复合分数与 Vader 的输出相匹配。假设我们有这句话：

"I like using VADER, its a fun tool to use"

VADER 选择的词是“喜欢”（+1.5 分）和“有趣”（+2.3）。根据文档，将这些值相加（因此 +3.8），然后使用以下函数将其归一化为 0 到 1 之间的范围：

(alpha = 15)
x / x2 + alpha

根据我们的数字，这应该变成：

3.8 / 14.44 + 15 = 0.1290

然而，VADER 会输出返回的复合分数，如下所示：

Scores: {'neg': 0.0, 'neu': 0.508, 'pos': 0.492, 'compound': 0.7003}

我的推理哪里出错了？类似的问题已被多次提出，但尚未提供 VADER 分类的实际示例。任何帮助，将不胜感激。

score 7 · Accepted Answer

只是你的标准化是错误的。从代码中定义了函数：

def normalize(score, alpha=15):
"""
Normalize the score to be between -1 and 1 using an alpha that
approximates the max expected value
"""
norm_score = score/math.sqrt((score*score) + alpha)
return norm_score

所以你有 3.8/sqrt(3.8*3.8 + 15) = 0.7003

python - NLTK 的 Vader 评分文本示例

1 回答 1

Related

Reference