我正在尝试使用 AFINN 字典(get_sentiments(“afinn”))对推文数据集的情绪。下面提供了数据集的示例:
A tibble: 10 x 2
Date TweetText
<dttm> <chr>
1 2018-02-10 21:58:19 "RT @RealSirTomJones: Still got the moves! That was a lo~
2 2018-02-10 21:58:19 "Yass Tom \U0001f600 #snakehips still got it #TheVoiceUK"
3 2018-02-10 21:58:19 Yasss tom he’s some chanter #TheVoiceUK #ItsNotUnusual
4 2018-02-10 21:58:20 #TheVoiceUK SIR TOM JONES...HE'S STILL HOT... AMAZING VO~
5 2018-02-10 21:58:21 I wonder how many hips Tom Jones has been through? #TheV~
6 2018-02-10 21:58:21 Tom Jones has still got it!!! #TheVoiceUK
7 2018-02-10 21:58:21 Good grief Tom Jones is amazing #TheVoiceuk
8 2018-02-10 21:58:21 RT @tonysheps: Sir Thomas Jones you’re a bloody legend #~
9 2018-02-10 21:58:22 @ITV Tom Jones what a legend!!! ❤️ #StillGotIt #TheVoice~
10 2018-02-10 21:58:22 "RT @RealSirTomJones: Still got the moves! That was a lo~
我想要做的是: 1. 将推文分成单独的单词。2. 使用 AFINN 词典对这些单词进行评分。3. 将每条推文的所有单词的得分相加 4. 将这个和返回到新的第三列,这样我就可以看到每条推文的得分。
对于类似的词典,我发现了以下代码:
# Initiate the scoreTopic
scoreTopic <- 0
# Start a loop over the documents
for (i in 1:length (myCorpus)) {
# Store separate words in character vector
terms <- unlist(strsplit(myCorpus[[i]]$content, " "))
# Determine the number of positive matches
pos_matches <- sum(terms %in% positive_words)
# Determine the number of negative matches
neg_matches <- sum(terms %in% negative_words)
# Store the difference in the results vector
scoreTopic [i] <- pos_matches - neg_matches
} # End of the for loop
dsMyTweets$score <- scoreTopic
但是,我无法调整此代码以使其与 afinn 字典一起使用。