我一直在使用情绪数据集,发现 bing 和 nrc 数据集包含一些既有正面情绪又有负面情绪的词。
** bing – 三个带有正面和负面情绪的词 **
env_test_bing_raw <- get_sentiments("bing") %>%
filter(word %in% c("envious", "enviously","enviousness"))
# A tibble: 6 x 2
word sentiment
<chr> <chr>
1 envious positive
2 envious negative
3 enviously positive
4 enviously negative
5 enviousness positive
6 enviousness negative
** nrc – 81 个带有正面和负面情绪的词 **
test_nrc <- as.data.frame(
get_sentiments("nrc") %>%
filter(sentiment %in% c("positive","negative")) %>%
group_by(word) %>%
summarize(count = n()) %>%
filter(count > 1))
env_test_nrc <- get_sentiments("nrc") %>%
filter(sentiment %in% c("positive","negative")) %>%
filter(word %in% test_nrc$word)
# A tibble: 162 x 2
word sentiment
<chr> <chr>
1 abundance negative
2 abundance positive
3 armed negative
4 armed positive
5 balm negative
6 balm positive
7 boast negative
8 boast positive
9 boisterous negative
10 boisterous positive
# ... with 152 more rows
我很好奇我是否做错了什么,或者一个词如何在单个源数据集中同时具有消极和积极的情绪。处理这些情况的标准做法是什么?
谢谢!