正如 Gavin Simpson 建议的那样,您可以使用 aspell。我想要让这个工作你需要安装 aspell。在许多 linux 发行版中,它是默认的;我不知道其他系统或它是否与 R 一起安装。
有关使用示例,请参见以下函数。这取决于您的输入数据以及您想要对您未指定的结果做什么(例如,第一个建议的正确拼写错误):
check_spelling <- function(text) {
# Create a file with on each line one of the words we want to check
text <- gsub("[,.]", "", text)
text <- strsplit(text, " ", fixed=TRUE)[[1]]
filename <- tempfile()
writeLines(text, con = filename);
# Check spelling of file using aspell
result <- aspell(filename)
# Extract list of suggestions from result
suggestions <- result$Suggestions
names(suggestions) <- result$Original
unlink(filename)
suggestions
}
> text <- "I am text mining a large database to create indicator variables which indicate the occurence of certain phrases in a comments field of an observation. The comments were entered by technicians, so the terms used are always consistent. "
> check_spelling(text)
$occurence
[1] "occurrence" "occurrences" "occurrence's"