r - 打印单词的出现/位置

Question

我尝试了一些不同的包来构建一个 R 程序，该程序将文本文件作为输入并生成该文件中的单词列表。每个单词都应该有一个向量，其中包含该单词在文件中存在的所有位置。例如，如果文本文件具有字符串：

"this is a nice text with nice characters"

输出应该是这样的：

$this  
[1] 1

$is      
[1] 2

$a        
[1] 3

$nice    
[1] 4 7

$text  
[1] 5

$with  
[1] 6

$characters
[1] 8

我遇到了一个有用的帖子，http://r.789695.n4.nabble.com/Memory-usage-in-R-grows-considerably-while-calculating-word-frequencies-td4644053.html，但它不包括每个单词的位置。我发现了一个名为“str_locate”的类似函数，但是我想计算“单词”而不是“字符”。

任何关于使用什么包/技术的指导，将不胜感激

score 7 · Accepted Answer

您可以使用 base R 来执行此操作（奇怪的是，它会产生您建议的输出）：

# data
x <- "this is a nice text with nice characters"
# split on whitespace
words <- strsplit(x, split = ' ')[[1]]
# find positions of every word
sapply(unique(words), function(x) which(x == words))

### result ###
$this
[1] 1

$is
[1] 2

$a
[1] 3

$nice
[1] 4 7

$text
[1] 5

$with
[1] 6

$characters
[1] 8

r - 打印单词的出现/位置

1 回答 1

Related

Reference