0

我无法找到最有效的方法来提取 VADER 极性分数中的最高值,并将其分配给我的数据框中的新列,如positive、negative 或中性

数据:

import pandas as pd

data = {
     'text': ['review text 1', 'review text 2', 'review text 3'],
     'stars': [5, 4, 3],
     'compound_score': [0.9950, 0.9940, 0.3450]
      }

df = pd.DataFrame(data)

对于数据框中的每条评论,我想从 VADER 中提取最高极性分数,而不是复合分数,所以如果评论 3具有极性分数:{'neg': 0.279, 'neu': 0.543, 'pos' : 0.178, 'compound': -0.3182},我如何创建一个新列,其中只有'neu'分数(命名为 'neutral'),因为它具有三个中的最高值,并且不计算化合物分数。在这个新列中,我将有三个可能的值:正、负和中性 - 取决于 VADER 的极性分数。

我的新数据框如下所示:

data = {
     'text': ['review text 1', 'review text 2', 'review text 3'],
     'stars': [5, 4, 3],
     'compound_score': [0.9950, 0.9940, 0.3450],
     'polarity': ['positive', 'positive', 'neutral']
      }

到目前为止,我的解决方案如下所示:

 #Creating a new column that contains polarity scores:

 data['polarity_scores'] = data['text'].apply(lambda text: sid.polarity_scores(str(text)))

 # Extract neg, neu, pos scores from polarity score and add them as a separate column
 data['neg'] = data['polarity_scores'].apply(lambda score:score['neg'])
 data['neu'] = data['polarity_scores'].apply(lambda score:score['neu'])
 data['pos'] = data['polarity_scores'].apply(lambda score:score['pos'])

 # Filter out the maximum value - label - to a new column

 data['polarity_label'] = np.where((data['neg'] > data['neu']) & (data['neg'] > data['pos']), 'negative', 'positive')
 data['polarity_label'] = np.where((data['neu'] > data['neg']) & (data['neu'] > data['pos']), 'neutral', data['polarity_label'])
4

0 回答 0