我正在对数据集中的不同类别进行主题建模,在此之前,我需要根据类别将数据拆分为不同的数据框,以便我可以将它们中的每一个都转换为文档术语矩阵。根据我对 for 循环的了解,我有以下几点。我卡住的部分是我需要列表中每个项目的输出。
category = c("a",
"b",
"c",
"d",
"e",
"f",
"g",
"h",
"i",
"j")
for (i in category) {
#Subset to test topic model
someDataFrame = anotherDataFrame %>%
filter(colVariable == i) %>% #here is the column of interest in the dataframe
select(ID, Word) %>%
group_by(ID, Word) %>%
count()
newDataFrame_i = someDataFrame %>% #here's where I'd like to export to individual dataframes
cast_dtm(ID, Word, n) #in order to do topic modeling, you have to build a document-term matrix
}
就像我之前说的,我希望列表中的每个项目都有一个数据框,但是,我不断得到Error in (function (cl, name, valueClass) : assignment of an object of class “numeric” is not valid for @‘Dim’ in an object of class “dgTMatrix”; is(value, "integer") is not TRUE.
我已经使用一个值(硬编码,比如“a”)完成了这项工作,并得到了我正在寻找的结果,所以我知道我的 for 循环已关闭。
解决方案:
filter_and_cast <- function(df, category){
df %>%
filter(colVariable == i) %>% #here is the column of interest in the dataframe
select(ID, Word) %>%
group_by(ID, Word) %>%
count() %>%
ungroup() %>%
cast_dm(ID, Word, n)
}
for (i in category) {
cast = paste("filterCast", i, sep = "_")
try(assign(cast, filter_and_cast(aDataFrame, i)))
}
感谢贡献者,我终于能够解决我的问题。