r - 使用 R 查找单词组合

Question

我正在编辑一些文本，想知道是否可以以编程方式搜索某些单词。

这些词：几乎，几乎，相当，接近和非常，在这些词旁边不起作用：确定，完整，死亡，完整，基本和灭绝。

假设我有这个字符向量：

text <- c("R is a very essential tool for data analysis. While it is regarded as domain specific, it is a very complete programming language. Almost certainly, many people who would benefit from using R, do not use it")

我可以让 R 返回一个数字向量，给出这些单词彼此相邻放置的行号（或句子号）吗？

请注意，我使用了“确定”，因此理想情况下，我需要 R 来搜索包含“确定”或其他词的词，而不是整个词“确定”或其他词。

score 2 · Accepted Answer

Andrie 的解决方案更适合您的需求，但是我为那些希望解析成绩单的未来搜索者提供了第二种解决方案。

library(qdap)
stext <- c("R is a very essential tool for data analysis. While it is regarded 
    as domain specific, it is a very complete programming language. Almost 
    certainly, many people who would benefit from using R, do not use it.")

dat <- sentSplit(data.frame(dialogue=stext), "dialogue")
with(dat, termco(dialogue, tot, "certain"))

##   tot word.count  certain
## 1 1.1          9        0
## 2 2.2         14        0
## 3 3.3         14 1(7.14%)

请注意，标点符号很重要，我需要在最后一句中添加缺失的句号。

要获得包含“确定”的句子的向量：

which(with(dat, termco(dialogue, tot, "certain"))$raw$certain > 0)
## [1] 3

score 2 · Accepted Answer

用于此，在使用grep以下命令在句子边界处拆分文本后strsplit：

stext <- strsplit(text, split="\\.")[[1]]
grep("certain", stext)
[1] 3

r - 使用 R 查找单词组合

2 回答 2

Related

Reference