r - quanteda textstat_simil 用于文本匹配

Question

你好，文本矿工，

我对该领域相当陌生，我正在尝试使用 quanteda 的 textstat_simil（R 包）来评估短语之间的相似性。这些步骤非常前期 - 因此我相信我遗漏了一些明显的东西，但我仍然无法让功能选择工作：

    #1 Create Corpus
myCorpus <- corpus(c("Anna, Maria, Luisa"))
checkWords <- c('Luisianna', 'anneta')
summary(myCorpus)

myDfm <- dfm(myCorpus)
myDfm # checking that features are there.
#removing stopwords & punctuation/ keep the stem 
myDfmNoStop <- dfm(myCorpus, remove = stopwords("english"), stem = TRUE, remove_punct = TRUE)

sim <- textstat_simil(myDfmNoStop , checkWords, method = "cosine", margin = "features")

返回错误：

“textstat_simil.dfm 中的错误（myDfm，checkWords，method = “cosine”，margin = “features”）：“selection”指定的特征不存在。”

所以，我不清楚如何指定我的coprus的正确特征/单词？

不用说 - 任何反馈都非常受欢迎:)

干杯，

乔治

score 2 · Accepted Answer

尝试这个：

myCorpus <- corpus(c(check = "Luisianna, anneta", 
                     target1 = "Anna, Maria, Luisa",
                     target2 = "Anna, anneta"))

myDfmNoStop <- dfm(myCorpus, remove = stopwords("english"), stem = TRUE, remove_punct = TRUE)

sim <- textstat_simil(myDfmNoStop , myDfmNoStop['check',], method = "cosine", margin = "documents")

r - quanteda textstat_simil 用于文本匹配

1 回答 1

Related

Reference