0

我正在尝试借助我的 DTM(文档术语矩阵)或 TDM(术语文档矩阵)在 R 中应用情感分析。我在论坛和谷歌上找不到任何类似的话题。因此,我创建了一个语料库,并从该语料库中生成了 R 中的 dtm/tdm。我的下一步是应用情绪分析,稍后我需要通过 SVM 进行股票预测。我给出的代码是:

    dtm <- DocumentTermMatrix(docs)
    dtm <- removeSparseTerms(dtm, 0.99)
    dtm <- as.data.frame(as.matrix(dtm))

    tdm <- TermDocumentMatrix(docs)
    tdm <- removeSparseTerms(tdm, 0.99)
    tdm <- as.data.frame(as.matrix(tdm))

我读到在 get_sentiments() 函数的帮助下可以通过 tidytext 包。但不可能将其应用于 DTM/TDM。如何对已清理的过滤词进行情绪分析,这些过滤词已经词干化、标记化等?我看到很多人对一个空洞句做了情感分析,但我想把它应用到我的单个词上,看看它们是积极的、消极的、得分等。在此先感谢!

4

2 回答 2

1

SentimentAnalysistm.

library(tm)
library(SentimentAnalysis)

documents <- c("Wow, I really like the new light sabers!",
               "That book was excellent.",
               "R is a fantastic language.",
               "The service in this restaurant was miserable.",
               "This is neither positive or negative.",
               "The waiter forget about my dessert -- what poor service!")

vc <- VCorpus(VectorSource(documents))
dtm <- DocumentTermMatrix(vc)

analyzeSentiment(dtm, 
  rules=list(
    "SentimentLM"=list(
      ruleSentiment, loadDictionaryLM()
    ),
    "SentimentQDAP"=list(
      ruleSentiment, loadDictionaryQDAP()
    )
  )
)
#   SentimentLM SentimentQDAP
# 1       0.000     0.1428571
# 2       0.000     0.0000000
# 3       0.000     0.0000000
# 4       0.000     0.0000000
# 5       0.000     0.0000000
# 6      -0.125    -0.2500000
于 2019-06-09T17:15:07.143 回答
0

要在 dtm 上使用 tidytext 获取情绪,首先将 dtm 转换为 tidy 格式,然后在 tidy 数据和极化词词典之间进行内部连接。我将使用与上面使用的相同的文档。上面示例中的一些文档是正面的,但给出了中性分数。让我们看看 tidytext 的表现如何

library(tidytext)
library(tm)
library(dplyr)
library(tidyr)

documents <- c("Wow I really like the new light sabers",
           "That book was excellent",
           "R is a fantastic language",
           "The service in this restaurant was miserable",
           "This is neither positive or negative",
           "The waiter forget about my dessert -- what poor service")

# create tidy format
vectors <- as.character(documents)
v_source <- VectorSource(vectors)
corpuss <- VCorpus(v_source)
dtm <- DocumentTermMatrix(corpuss)
as_tidy <- tidy(dtm)

# Using bing lexicon: you can use other as well(nrc/afinn)
bing <- get_sentiments("bing")
as_bing_words <- inner_join(as_tidy,bing,by = c("term"="word"))
# check positive and negative words 
as_bing_words  

# set index for documents number 
index <- as_bing_words%>%mutate(doc=as.numeric(document))
# count by index and sentiment
index <- index %>% count(sentiment,doc)
# spread into positives and negavtives
index <- index %>% spread(sentiment,n,fill=0)
# add polarity scorer
index <- index %>% mutate(polarity = positive-negative)
index

在此处输入图像描述

Doc 4 和 6 是阴性的,5 中性和其余阳性,实际上是这样

于 2019-08-23T07:36:39.593 回答