r - 在 R 中将元数据添加到 STM

Question

我在使用 R 中的 STM 包时遇到问题。我在 Quanteda 中建立了一个语料库，我想将其转换为 STM 格式。我已将元数据保存为独立的 CSV 文件，并且我想要将文本文档与元数据合并的代码。readCorpus() 和 "convert() 函数不会自动将元数据信息添加到语料库中。

这是它在 Quanteda 中的样子：

EUdocvars <- read.csv("EU_metadata.csv", stringsAsFactors = FALSE)

EUdocvars$Period <- as.factor(EUdocvars$Period)
EUdocvars$Country <-as.factor(EUdocvars$Country)
EUdocvars$Region <- as.factor(EUdocvars$Region)

EUCorpus <- corpus(textfile(file='PROJECT/*.txt'), encodingFrom = "UTF-8-BOM")
docvars(EUCorpus) <- EUdocvars

EUDfm <- dfm(EUCorpus)

有没有办法使用 STM 包做同样的事情？

score 2 · Accepted Answer

Support for this was added just recently (v0.99), after addressing https://github.com/kbenoit/quanteda/issues/209.

So this should work:

EUstm <- convert(EUdfm, to = "stm", docvars = docvars(EUCorpus))

And then EUstm has all of the elements including meta that you need for fitting STM models.

score 0 · Accepted Answer

对象（一个列表）有一个名为的stm元素$meta，它采用维度的数据框number of documents x number of covariates。所以对于你的问题：

EUCorpus$meta <- EUdocvars

r - 在 R 中将元数据添加到 STM

2 回答 2

Related

Reference