2

我已经经历了各种reshape问题,但不相信以前有人问过这个迭代。我正在处理一个包含 81K 行和 4188 个变量的数据框。变量 161:4188 是作为不同变量呈现的测量值。在第idvar1 列中。我想重复第 1:160 列并为第 169:4188 列创建新记录。最终数据框的维度为 162 列和 326,268,000 行(81K * 4028 个变量转换为唯一记录)。

这是我尝试过的:

reshapeddf <- reshape(c, idvar = "PID", varying = c(dput(names(c[161:4188]))), v.names = "viewership", timevar = "network.show", times = c(dput(names(c[161:4188]))), direction = "long")

操作没有完成。我等了将近10分钟。这是正确的方法吗?我使用的是 Windows 7、8GB RAM、i5 3.20ghz PC。在 R 中完成此转置的最有效方法是什么?BondedDust 和 Nick 的两个答案都很聪明,但我遇到了记忆问题。有没有办法在这个线程中使用这三种方法中的任何一种reshapetidyr或者do.call可以使用来实现ff

在下面的示例数据中,1:4 列是我要重复的列,5:9 列是要为其创建新记录的列。

structure(list(PID = c(1003401L, 1004801L, 1007601L, 1008601L, 
1008602L, 1011901L), HHID = c(10034L, 10048L, 10076L, 10086L, 
10086L, 10119L), HH.START.DATE = structure(c(1378440000, 1362974400, 
1399521600, 1352869200, 1352869200, 1404964800), class = c("POSIXct", 
"POSIXt"), tzone = ""), VISITOR.CODE = structure(c(1L, 1L, 1L, 
1L, 1L, 1L), .Label = c("0", "L"), class = "factor"), WEIGHTED.MINUTES.VIEWED..ABC...20.20.FRI = c(0, 
0, 305892, 0, 101453, 0), WEIGHTED.MINUTES.VIEWED..ABC...BLACK.ISH = c(0, 
0, 0, 0, 127281, 0), WEIGHTED.MINUTES.VIEWED..ABC...CASTLE = c(0, 
27805, 0, 0, 0, 0), WEIGHTED.MINUTES.VIEWED..ABC...CMA.AWARDS = c(0, 
679148, 0, 0, 278460, 498972), WEIGHTED.MINUTES.VIEWED..ABC...COUNTDOWN.TO.CMA.AWARDS = c(0, 
316448, 0, 0, 0, 0)), .Names = c("PID", "HHID", "HH.START.DATE", 
"VISITOR.CODE", "WEIGHTED.MINUTES.VIEWED..ABC...20.20.FRI", "WEIGHTED.MINUTES.VIEWED..ABC...BLACK.ISH", 
"WEIGHTED.MINUTES.VIEWED..ABC...CASTLE", "WEIGHTED.MINUTES.VIEWED..ABC...CMA.AWARDS", 
"WEIGHTED.MINUTES.VIEWED..ABC...COUNTDOWN.TO.CMA.AWARDS"), row.names = c(NA, 
6L), class = "data.frame")
4

2 回答 2

2

可能像这样简单:

   dat2 <- cbind(dat[1:4],   stack( dat[5:length(dat)] )
于 2014-12-05T00:00:40.957 回答
1

我认为这应该有效:

library(tidyr)
newdf <- gather(yourdf, program, minutes, -PID:-VISITOR.CODE)
于 2014-12-05T00:16:40.940 回答