python - 使用 VADER 极性分数的最大值作为新列的标签

Question

我无法找到最有效的方法来提取 VADER 极性分数中的最高值，并将其分配给我的数据框中的新列，如positive、negative 或中性。

数据：

import pandas as pd

data = {
     'text': ['review text 1', 'review text 2', 'review text 3'],
     'stars': [5, 4, 3],
     'compound_score': [0.9950, 0.9940, 0.3450]
      }

df = pd.DataFrame(data)

对于数据框中的每条评论，我想从 VADER 中提取最高极性分数，而不是复合分数，所以如果评论 3具有极性分数：{'neg': 0.279, 'neu': 0.543, 'pos' : 0.178, 'compound': -0.3182}，我如何创建一个新列，其中只有'neu'分数（命名为 'neutral'），因为它具有三个中的最高值，并且不计算化合物分数。在这个新列中，我将有三个可能的值：正、负和中性 - 取决于 VADER 的极性分数。

我的新数据框如下所示：

data = {
     'text': ['review text 1', 'review text 2', 'review text 3'],
     'stars': [5, 4, 3],
     'compound_score': [0.9950, 0.9940, 0.3450],
     'polarity': ['positive', 'positive', 'neutral']
      }

到目前为止，我的解决方案如下所示：

 #Creating a new column that contains polarity scores:

 data['polarity_scores'] = data['text'].apply(lambda text: sid.polarity_scores(str(text)))

 # Extract neg, neu, pos scores from polarity score and add them as a separate column
 data['neg'] = data['polarity_scores'].apply(lambda score:score['neg'])
 data['neu'] = data['polarity_scores'].apply(lambda score:score['neu'])
 data['pos'] = data['polarity_scores'].apply(lambda score:score['pos'])

 # Filter out the maximum value - label - to a new column

 data['polarity_label'] = np.where((data['neg'] > data['neu']) & (data['neg'] > data['pos']), 'negative', 'positive')
 data['polarity_label'] = np.where((data['neu'] > data['neg']) & (data['neu'] > data['pos']), 'neutral', data['polarity_label'])

python - 使用 VADER 极性分数的最大值作为新列的标签

0 回答 0

Related

Reference