我对探索性分析很陌生,但我创建了一个情绪分析
df['polarity'] = df['Comment'].apply(lambda x: TextBlob(x).sentiment.polarity)
我为数据框中最常见的单词创建了 ngram
def get_top_n_words(corpus, n=None):
vec = CountVectorizer().fit(corpus)
bag_of_words = vec.transform(corpus)
sum_words = bag_of_words.sum(axis=0)
words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]
words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)
return words_freq[:n]
common_words = get_top_n_words(df['Comment'], 20)
for word, freq in common_words:
print(word, freq)
df1 = pd.DataFrame(common_words, columns = ['Comment' , 'count'])
df1.groupby('Comment').sum()['count'].sort_values(ascending=False).iplot(
kind='bar', yTitle='Count', color='blue', title='Top 20 Words in Comments Before Removing Stop Words')
如何隔离负极性(<0)文本并创建仅分析负面情绪文本的 ngram?