0

我需要一些帮助来弄清楚我们如何在 R 中模拟将标签“NOT_”添加到否定词之后的每个单词直到下一个标点符号的解决方案。

可以在此处找到 Python 代码的解决方案如何在 "not"、"no" 和 "never" 之后的字符串中为否定词添加标签

我有以下解决方案,用于将标签“NOT_”添加到否定词之后的下一个词:不,从不,不,没有,不太可能

str_negate <- function(x) {
  gsub("not ","not NOT_",
            gsub("n't ","n't NOT_",
            gsub("never ","never NOT_",
            gsub("without ","without NOT_",
            gsub("unlikely to ","unlikely to NOT_",x)))))
}

str_negate(FeedbackCommentsVectorProc$Sentences)

但我需要对其进行调整,以便在每个单词中添加标签“NOT_”,直到下一个标点符号。

任何帮助深表感谢!

4

1 回答 1

2

编辑

在试图弄清楚这一点之后,这是我能想出的最简单的解决方案。注意:如果字符串在标点符号之前有多个否定词,这将失败。

library(gsubfn)
str_negate <- function(x) {
   x1 <- gsub("(not|n't|never|without|unlikely to) (\\w+)", '\\1 NOT_\\2', x)
   x2 <- gsubfn('NOT_([^[:punct:]]+)', ~ gsub('(\\w+)', 'NOT_\\1', x), x1)
   x2
}
x <- "It was never going to work, he thought. He did not play so well, so he had to practice some more."
str_negate(x)
## [1] "It was never NOT_going NOT_to NOT_work, he thought. He did not NOT_play NOT_so NOT_well, so he had to practice some more."

如果在标点符号前有多个否定词会出现这种情况......

str_negate <- function(x) {
   x1 <- gsub("(not|n't|never|without|unlikely to) \\K", 'NOT_', x, perl=T)
   x2 <- gsubfn('NOT_([a-zA-Z_ ]+)', ~ gsub("\\b(?!(?i:not|n't|never|without|unlikely to))(?=\\w+)", 'NOT_', x, perl=TRUE), x1)
   x2
}
x <- 'It was unlikely to work and it seems like it never was going to end.'
str_negate(x)
## [1] "It was unlikely to NOT_work NOT_and NOT_it NOT_seems NOT_like NOT_it never NOT_was NOT_going NOT_to NOT_end."
于 2014-08-25T23:34:28.103 回答