0

我有一本名为“评论”的字典:

reviews= {1: {'like': [0.0005, 0.0025], 'the': [0.5, 0.5], 'acting': [0.5, 0.5]},
          2: {'plot': [0.5, 0.5], 'hate': [0.0029, 0.0002], 'story': [0.5, 0.5]}}

对于字典的每条评论(本例中为 1 和 2),我需要在其单词的值上迭代两个公式。这些公式将计算每条评论的“neg_post_prob”和“pos_post_prob”。

公式是:

  1. 'neg_post_prob' = (neg_prior * pos) / (neg_prior * neg + pos_prior * pos)
  2. 'pos_post_prob' = (pos_prior * pos) / (neg_prior * neg + pos_prior * pos)

在哪里:

  • 'neg_prior' 是在上一次单词迭代中计算的 'neg_post_prob' ,并且
  • 'pos_prior' 是在上一次单词迭代中为 pos 计算的 'pos_post_prob'

对于每条评论的第一个词,先验应该等于 0.5

这是我审查 1 和 2 的代码:

#Review 1: 

# the prior before starting the iteration is 0.5
prior = 0.5

# priors after the first word "like"
neg_prior_like = (prior*0.0005) / (prior * 0.0005 + prior * 0.0025)
pos_prior_like = (prior*0.0025) / (prior * 0.0005 + prior * 0.0025)


# priors after the second word "the"
neg_prior_like_the = (neg_prior_like * 0.5) / (neg_prior_like * 0.5 + pos_prior_like * 0.5)
pos_prior_like_the = (pos_prior_like * 0.5) / (neg_prior_like * 0.5 + pos_prior_like * 0.5)


# post_prob after last word "acting"
neg_post_prob = (neg_prior_like_the * 0.5) / (neg_prior_like_the * 0.5 + pos_prior_like_the * 0.5)
pos_post_prob = (pos_prior_like_the * 0.5) / (neg_prior_like_the * 0.5 + pos_prior_like_the * 0.5)


validation = neg_post_prob + pos_post_prob
#Review 2: 

# the prior before starting the iteration is 0.5
prior = 0.5

# priors after the first word "plot"
neg_prior_plot = (prior*0.5) / (prior * 0.5 + prior * 0.5)
pos_prior_plot = (prior*0.5) / (prior * 0.5 + prior * 0.5)


# priors after the second word "hate"
neg_prior_plot_hate = (neg_prior_plot * 0.0029) / (neg_prior_plot * 0.0029 + pos_prior_plot * 0.0002)
pos_prior_plot_hate = (pos_prior_plot * 0.0002) / (neg_prior_plot * 0.0029 + pos_prior_plot * 0.0002)


# post_prob after last word "story"
neg_post_prob = (neg_prior_plot_hate * 0.5) / (neg_prior_plot_hate * 0.5 + pos_prior_plot_hate * 0.5)
pos_post_prob = (pos_prior_plot_hate * 0.5) / (neg_prior_plot_hate * 0.5 + pos_prior_plot_hate * 0.5)


validation = neg_post_prob + pos_post_prob

但我想要的结果是:

sentiment = {'review': [1, 2],
    'neg_post_prob': [0.17, 0.94],
    'pos_post_prob': [0.83, 0.06],
    'validation': [1, 1]
    }

sentiment = pd.DataFrame(sentiment, columns = ['review', 'neg_post_prob', 'pos_post_prob', 'validation'])

print (sentiment)

情绪

4

1 回答 1

2

使用functools 模块中的reduce

代码

from functools import reduce
import pandas as pd

def update(priors, values):
    '''
        Provides updated probabilities based upon previous pair of neg, pos
    '''
    # Previous neg, pos pair
    neg, pos = priors
    
    # New negative and positive (using OP update equation)
    scale = (pos *values[0] + neg * values[1])   # denominator
    new_neg = (neg*values[0]) / scale
    new_pos = (pos*values[1]) / scale
    return new_neg, new_pos                      # new update pair
    
def calc(reviews):
    ''' Main function to perform calculations and 
        produce pandas data frame
    '''
    sentiment = {'review':[],
                 'neg_post_prob': [],
                 'pos_post_prob': [],
                 'validation': []}
    
    for review_id, word_values in reviews.items():
        # word_values is dictionary of negative/positive for words in review
        values = word_values.values()  # array of neg/pos values
        
        # Use reduce to iterative apply update function to sequence of value
        result = reduce(update, values, [0.5, 0.5])
        neg, pos = result
        validation = neg + pos
        
        # Update results
        sentiment['review'].append(review_id)
        sentiment['neg_post_prob'].append(neg)
        sentiment['pos_post_prob'].append(pos)
        sentiment['validation'].append(validation)
        
    
    return pd.DataFrame(sentiment)
        

测试

reviews= {1: {'like': [0.0005, 0.0025], 'the': [0.5, 0.5], 'acting': [0.5, 0.5]},
          2: {'plot': [0.5, 0.5], 'hate': [0.0029, 0.0002], 'story': [0.5, 0.5]}}

df = calc(reviews)

df

    review  neg_post_prob   pos_post_prob   validation
0   1       0.166667        0.833333        1.0
1   2       0.935484        0.064516        1.0
于 2020-11-14T11:49:44.293 回答