我正在使用该tm
库构建一个术语文档矩阵。
# Create corpus.
corporize <- function(dir_to_corporize)
{
crp <- Corpus(DirSource(dir_to_corporize, mode="text", encoding="ASCII"),
readerControl=list(reader=readPlain, language="en_EN"))
crp <- tm_map(crp, removeWords, stopwords("english"))
crp <- tm_map(crp, removePunctuation, preserve_intra_word_dashes=F)
crp <- tm_map(crp, removeNumbers)
crp <- tm_map(crp, stripWhitespace)
crp <- tm_map(crp, content_transformer(tolower))
}
然而,当我检查我的术语文档矩阵时,我发现还有几个停用词:
the last time i saw
we need talk about kevin
you make me feel like
为什么会这样,我该怎么办?