我有一本名为“评论”的字典:
reviews= {1: {'like': [0.0005, 0.0025], 'the': [0.5, 0.5], 'acting': [0.5, 0.5]},
2: {'plot': [0.5, 0.5], 'hate': [0.0029, 0.0002], 'story': [0.5, 0.5]}}
对于字典的每条评论(本例中为 1 和 2),我需要在其单词的值上迭代两个公式。这些公式将计算每条评论的“neg_post_prob”和“pos_post_prob”。
公式是:
- 'neg_post_prob' = (neg_prior * pos) / (neg_prior * neg + pos_prior * pos)
- 'pos_post_prob' = (pos_prior * pos) / (neg_prior * neg + pos_prior * pos)
在哪里:
- 'neg_prior' 是在上一次单词迭代中计算的 'neg_post_prob' ,并且
- 'pos_prior' 是在上一次单词迭代中为 pos 计算的 'pos_post_prob'
对于每条评论的第一个词,先验应该等于 0.5
这是我审查 1 和 2 的代码:
#Review 1:
# the prior before starting the iteration is 0.5
prior = 0.5
# priors after the first word "like"
neg_prior_like = (prior*0.0005) / (prior * 0.0005 + prior * 0.0025)
pos_prior_like = (prior*0.0025) / (prior * 0.0005 + prior * 0.0025)
# priors after the second word "the"
neg_prior_like_the = (neg_prior_like * 0.5) / (neg_prior_like * 0.5 + pos_prior_like * 0.5)
pos_prior_like_the = (pos_prior_like * 0.5) / (neg_prior_like * 0.5 + pos_prior_like * 0.5)
# post_prob after last word "acting"
neg_post_prob = (neg_prior_like_the * 0.5) / (neg_prior_like_the * 0.5 + pos_prior_like_the * 0.5)
pos_post_prob = (pos_prior_like_the * 0.5) / (neg_prior_like_the * 0.5 + pos_prior_like_the * 0.5)
validation = neg_post_prob + pos_post_prob
#Review 2:
# the prior before starting the iteration is 0.5
prior = 0.5
# priors after the first word "plot"
neg_prior_plot = (prior*0.5) / (prior * 0.5 + prior * 0.5)
pos_prior_plot = (prior*0.5) / (prior * 0.5 + prior * 0.5)
# priors after the second word "hate"
neg_prior_plot_hate = (neg_prior_plot * 0.0029) / (neg_prior_plot * 0.0029 + pos_prior_plot * 0.0002)
pos_prior_plot_hate = (pos_prior_plot * 0.0002) / (neg_prior_plot * 0.0029 + pos_prior_plot * 0.0002)
# post_prob after last word "story"
neg_post_prob = (neg_prior_plot_hate * 0.5) / (neg_prior_plot_hate * 0.5 + pos_prior_plot_hate * 0.5)
pos_post_prob = (pos_prior_plot_hate * 0.5) / (neg_prior_plot_hate * 0.5 + pos_prior_plot_hate * 0.5)
validation = neg_post_prob + pos_post_prob
但我想要的结果是:
sentiment = {'review': [1, 2],
'neg_post_prob': [0.17, 0.94],
'pos_post_prob': [0.83, 0.06],
'validation': [1, 1]
}
sentiment = pd.DataFrame(sentiment, columns = ['review', 'neg_post_prob', 'pos_post_prob', 'validation'])
print (sentiment)
