1

我正在对数据集中的不同类别进行主题建模,在此之前,我需要根据类别将数据拆分为不同的数据框,以便我可以将它们中的每一个都转换为文档术语矩阵。根据我对 for 循环的了解,我有以下几点。我卡住的部分是我需要列表中每个项目的输出。


category = c("a",
         "b",
         "c",
         "d",
         "e",
         "f",
         "g",
         "h",
         "i",
         "j")

for (i in category) {

#Subset to test topic model
someDataFrame = anotherDataFrame %>%
  filter(colVariable == i) %>% #here is the column of interest in the dataframe
  select(ID, Word) %>%
  group_by(ID, Word) %>%
  count()

newDataFrame_i = someDataFrame %>% #here's where I'd like to export to individual dataframes
  cast_dtm(ID, Word, n) #in order to do topic modeling, you have to build a document-term matrix 


}

就像我之前说的,我希望列表中的每个项目都有一个数据框,但是,我不断得到Error in (function (cl, name, valueClass) : assignment of an object of class “numeric” is not valid for @‘Dim’ in an object of class “dgTMatrix”; is(value, "integer") is not TRUE.

我已经使用一个值(硬编码,比如“a”)完成了这项工作,并得到了我正在寻找的结果,所以我知道我的 for 循环已关闭。

解决方案:

filter_and_cast <- function(df, category){
  df %>%
  filter(colVariable == i) %>% #here is the column of interest in the dataframe
  select(ID, Word) %>%
  group_by(ID, Word) %>%
  count() %>%
  ungroup() %>%
  cast_dm(ID, Word, n)
}

for (i in category) {
  cast = paste("filterCast", i, sep = "_")
  try(assign(cast, filter_and_cast(aDataFrame, i)))
}

感谢贡献者,我终于能够解决我的问题。

4

2 回答 2

1

您只需要将中间输出保存到一个对象中

category = c("a",
         "b",
         "c",
         "d",
         "e",
         "f",
         "g",
         "h",
         "i",
         "j")
out <-list()
for (i in category) {

#Subset to test topic model
someDataFrame = anotherDataFrame %>%
  filter(colVariable == i) %>% #here is the column of interest in the dataframe
  select(ID, Word) %>%
  group_by(ID, Word) %>%
  count()

out[[i]] = someDataFrame %>% #here's where I'd like to export to individual dataframes
  cast_dtm(ID, Word, n) #in order to do topic modeling, you have to build a document-term matrix 


}

或者更多的R-esque风格

filter_and_cast <- function(df, category){
filter(colVariable == i) %>% #here is the column of interest in the dataframe
  select(ID, Word) %>%
  group_by(ID, Word) %>%
  count() %>%
  ungroup() %>%
  cast_dm(ID, Word, n)
}

然后你可以做类似的事情

map(category, filter_and_cast, df = anotherDataFrame )
于 2019-09-12T17:32:29.477 回答
1

我认为您可以使用assign()函数来解决问题,该函数可以创建传递名称和值的对象。

就像是:

ObjectName = paste(("newDataFrame", i, sep = "_")
assign(ObjectName, newDataFrame_i)
于 2019-09-12T17:36:44.587 回答