2

我正在为金融文章做情绪分析。为了提高我的朴素贝叶斯分类器的准确性,我想实现否定处理。

具体来说,我想在“not”或“n't”后面的单词中添加前缀“not_”

所以如果我的语料库中有这样的东西:

 x <- "They didn't sell the company." 

我想得到以下信息:

"they didn't not_sell the company."

(停用词“没有”稍后将被删除)

我只能找到该gsub()功能,但它似乎不适用于此任务。

任何帮助,将不胜感激!!谢谢!

4

1 回答 1

1

具体来说,我想在“not”或“n't”后面的单词中添加前缀“not_”

str_negate <- function(x) {
  gsub("not ","not not_",gsub("n't ","n't not_",x))
}

或者我想你可以使用 strsplit:

str_negate <- function(x) {
  str_split <- unlist(strsplit(x=x, split=" "))
  is_negative <- grepl("not|n't",str_split,ignore.case=T)
  negate_me <- append(FALSE,is_negative)[1:length(str_split)]
  str_split[negate_me==T]<- paste0("not_",str_split[negate_me==T])
  paste(str_split,collapse=" ")
}

无论哪种方式给你:

> str_negate("They didn't sell the company")
[1] "They didn't not_sell the company"
> str_negate("They did not sell the company")
[1] "They did not not_sell the company"
于 2014-06-30T21:01:21.860 回答