要在 dtm 上使用 tidytext 获取情绪,首先将 dtm 转换为 tidy 格式,然后在 tidy 数据和极化词词典之间进行内部连接。我将使用与上面使用的相同的文档。上面示例中的一些文档是正面的,但给出了中性分数。让我们看看 tidytext 的表现如何
library(tidytext)
library(tm)
library(dplyr)
library(tidyr)
documents <- c("Wow I really like the new light sabers",
"That book was excellent",
"R is a fantastic language",
"The service in this restaurant was miserable",
"This is neither positive or negative",
"The waiter forget about my dessert -- what poor service")
# create tidy format
vectors <- as.character(documents)
v_source <- VectorSource(vectors)
corpuss <- VCorpus(v_source)
dtm <- DocumentTermMatrix(corpuss)
as_tidy <- tidy(dtm)
# Using bing lexicon: you can use other as well(nrc/afinn)
bing <- get_sentiments("bing")
as_bing_words <- inner_join(as_tidy,bing,by = c("term"="word"))
# check positive and negative words
as_bing_words
# set index for documents number
index <- as_bing_words%>%mutate(doc=as.numeric(document))
# count by index and sentiment
index <- index %>% count(sentiment,doc)
# spread into positives and negavtives
index <- index %>% spread(sentiment,n,fill=0)
# add polarity scorer
index <- index %>% mutate(polarity = positive-negative)
index
Doc 4 和 6 是阴性的,5 中性和其余阳性,实际上是这样