我构建了我的自定义字典,现在我想将它映射到我的推文数据框。我怎样才能做到这一点?
所以基本上,我有这3本字典。积极、消极和中性的话。我有 twitter 数据集,我想将我的字典映射到该数据集以确定每条推文的情绪。这是我到目前为止所做的。
positive='1'
negative='-1'
neutral ='0'
pos_Words=set(['good','beautiful','best',])
neg_Words=set(['bad','suck','damn'])
def sentiment(words):
pslen= len(pos_Words.intersection(words))
nglen= len(neg_Words.intersection(words))
if pslen > nglen:
return positive
elif pslen < nglen:
return negative
else:
return neutral
from collections import Counter
def count_senti(sentences):
sents = Counter()
words = Counter()
for sentence in sentences:
senti = sentiment(sentence)
sents[senti] += 1
words[senti]+= len(sentence)
return sents,words
import nltk
def parse_senti(text):
sentences = [
[word.lower() for word in nltk.word_tokenize(sentence)]
for sentence in nltk.sent_tokenize(text)
]
sents, words = count_senti(sentences)
total = sum(words.values())
for sentiment, count in words.items():
pcent = (count / total) * 100
nsents = sents[sentiment]
print(
pcent,sentiment,nsents
)
parse_senti('good. bad')
结果是 66.66666666666666 1 1 33.33333333333333 -1 1
但我希望它映射到我用 csv 编写的 twitter 数据框中的每条推文。
和想法好吗?
我做了 parse_senti('dataframe')
错误发生预期的字符串或类似字节的对象