我有一大堆 data.frames 需要按列成对绑定,然后按行绑定,然后再输入预测模型。由于不会修改任何值,我希望最终的 data.frame 指向我列表中的原始 data.frames。
例如:
library(pryr)
#individual dataframes
df1 <- data.frame(a=1:1e6+0, b=1:1e6+1)
df2 <- data.frame(a=1:1e6+2, b=1:1e6+3)
df3 <- data.frame(a=1:1e6+4, b=1:1e6+5)
#each occupy 16MB
object_size(df1) # 16 MB
object_size(df2) # 16 MB
object_size(df3) # 16 MB
object_size(df1, df2, df3) # 48 MB
#will be in a named list
dfs <- list(df1=df1, df2=df2, df3=df3)
#putting into list doesn't create a copy
object_size(df1, df2, df3, dfs) #48MB
最终的 data.frame 将具有此方向(每对唯一的 data.frames 由列绑定,然后对由行绑定):
df1, df2
df1, df3
df2, df3
我目前正在执行此操作:
#generate unique df combinations
df_names <- names(dfs)
pairs <- combn(df_names, 2, simplify=FALSE)
#bind dfs by columns
combo_dfs <- lapply(pairs, function(x) cbind(dfs[[x[1]]], dfs[[x[2]]]))
#no copies created yet
object_size(dfs, combo_dfs) # 48MB
#bind dfs by rows
combo_df <- do.call(rbind, combo_dfs)
#now data gets copied
object_size(combo_df) # 96 MB
object_size(dfs, combo_df) # 144 MB
如何避免复制我的数据,但仍能获得相同的最终结果?