r - Quanteda - 提取已识别的字典单词

Question

我正在尝试从 Quanteda dfm 中提取已识别的字典单词，但一直无法找到解决方案。

有人对此有解决方案吗？

样本输入：

dict <- dictionary(list(season = c("spring", "summer", "fall", "winter")))
dfm  <- dfm("summer is great", dictionary  = dict)

输出：

 > dfm
 Document-feature matrix of: 1 document, 1 feature.
 1 x 1 sparse Matrix of class "dfmSparse"

   features
docs    season
text1      1

我现在知道句子中已经确定了一个季节性字典词，但我也想知道它是哪个词。

这最好以表格格式提取：

docs    dict     dictWord
text1   season   summer

score 1 · Accepted Answer

您可以使用参数创建第二个 dfm keptFeatures，然后cbind()将其创建到第一个字典式 dfm。

dict <- dictionary(list(season = c("spring", "summer", "fall", "winter")))
txt <- "summer is great"
season_dfm  <- dfm(txt, dictionary  = dict, verbose = FALSE)
dict_dfm <- dfm(txt, select = dict, verbose = FALSE)

cbind(season_dfm, dict_dfm)
## Document-feature matrix of: 1 document, 2 features.
## 1 x 2 sparse Matrix of class "dfmSparse"
##       season summer
## text1      1      1

如果您希望将输出作为表格，它将是：

dict_df <- as.data.frame(combined_dfm)
names(dict_df)[2] <- "dictWord"
dict_df
##       season dictWord
## text1      1        1

但这只有在每个文本都有一个字典值时才有效——否则dict_dfm将有多个特征列。

r - Quanteda - 提取已识别的字典单词

1 回答 1

Related

Reference