1

我正在编辑一些文本,想知道是否可以以编程方式搜索某些单词。

这些词:几乎,几乎,相当,接近和非常,在这些词旁边不起作用:确定,完整,死亡,完整,基本和灭绝。

假设我有这个字符向量:

text <- c("R is a very essential tool for data analysis. While it is regarded as domain specific, it is a very complete programming language. Almost certainly, many people who would benefit from using R, do not use it")

我可以让 R 返回一个数字向量,给出这些单词彼此相邻放置的行号(或句子号)吗?

请注意,我使用了“确定”,因此理想情况下,我需要 R 来搜索包含“确定”或其他词的词,而不是整个词“确定”或其他词。

4

2 回答 2

2

Andrie 的解决方案更适合您的需求,但是我为那些希望解析成绩单的未来搜索者提供了第二种解决方案。

library(qdap)
stext <- c("R is a very essential tool for data analysis. While it is regarded 
    as domain specific, it is a very complete programming language. Almost 
    certainly, many people who would benefit from using R, do not use it.")

dat <- sentSplit(data.frame(dialogue=stext), "dialogue")
with(dat, termco(dialogue, tot, "certain"))

##   tot word.count  certain
## 1 1.1          9        0
## 2 2.2         14        0
## 3 3.3         14 1(7.14%)

请注意,标点符号很重要,我需要在最后一句中添加缺失的句号。

要获得包含“确定”的句子的向量:

which(with(dat, termco(dialogue, tot, "certain"))$raw$certain > 0)
## [1] 3
于 2013-04-29T13:22:46.107 回答
2

用于此,在使用grep以下命令在句子边界处拆分文本后strsplit

stext <- strsplit(text, split="\\.")[[1]]
grep("certain", stext)
[1] 3
于 2013-04-29T11:40:45.707 回答