我已经经历了各种reshape
问题,但不相信以前有人问过这个迭代。我正在处理一个包含 81K 行和 4188 个变量的数据框。变量 161:4188 是作为不同变量呈现的测量值。在第idvar
1 列中。我想重复第 1:160 列并为第 169:4188 列创建新记录。最终数据框的维度为 162 列和 326,268,000 行(81K * 4028 个变量转换为唯一记录)。
这是我尝试过的:
reshapeddf <- reshape(c, idvar = "PID", varying = c(dput(names(c[161:4188]))),
v.names = "viewership",
timevar = "network.show",
times = c(dput(names(c[161:4188]))),
direction = "long")
操作没有完成。我等了将近10分钟。这是正确的方法吗?我使用的是 Windows 7、8GB RAM、i5 3.20ghz PC。在 R 中完成此转置的最有效方法是什么?BondedDust 和 Nick 的两个答案都很聪明,但我遇到了记忆问题。有没有办法在这个线程中使用这三种方法中的任何一种reshape
,tidyr
或者do.call
可以使用来实现ff
?
在下面的示例数据中,1:4 列是我要重复的列,5:9 列是要为其创建新记录的列。
structure(list(PID = c(1003401L, 1004801L, 1007601L, 1008601L,
1008602L, 1011901L), HHID = c(10034L, 10048L, 10076L, 10086L,
10086L, 10119L), HH.START.DATE = structure(c(1378440000, 1362974400,
1399521600, 1352869200, 1352869200, 1404964800), class = c("POSIXct",
"POSIXt"), tzone = ""), VISITOR.CODE = structure(c(1L, 1L, 1L,
1L, 1L, 1L), .Label = c("0", "L"), class = "factor"), WEIGHTED.MINUTES.VIEWED..ABC...20.20.FRI = c(0,
0, 305892, 0, 101453, 0), WEIGHTED.MINUTES.VIEWED..ABC...BLACK.ISH = c(0,
0, 0, 0, 127281, 0), WEIGHTED.MINUTES.VIEWED..ABC...CASTLE = c(0,
27805, 0, 0, 0, 0), WEIGHTED.MINUTES.VIEWED..ABC...CMA.AWARDS = c(0,
679148, 0, 0, 278460, 498972), WEIGHTED.MINUTES.VIEWED..ABC...COUNTDOWN.TO.CMA.AWARDS = c(0,
316448, 0, 0, 0, 0)), .Names = c("PID", "HHID", "HH.START.DATE",
"VISITOR.CODE", "WEIGHTED.MINUTES.VIEWED..ABC...20.20.FRI", "WEIGHTED.MINUTES.VIEWED..ABC...BLACK.ISH",
"WEIGHTED.MINUTES.VIEWED..ABC...CASTLE", "WEIGHTED.MINUTES.VIEWED..ABC...CMA.AWARDS",
"WEIGHTED.MINUTES.VIEWED..ABC...COUNTDOWN.TO.CMA.AWARDS"), row.names = c(NA,
6L), class = "data.frame")