我在 R 中运行代码,其示例如下,带有一个小数据集 -
library(plyr)
Ex<-structure(list(X1 = c(-36.8598, -37.1726, -36.4343, -36.8644,
-37.0599, -34.8818, -31.9907, -37.8304,
-34.3367, -31.2984, -33.5731),
X2 = c(64.26, 63.085, 66.36, 61.08, 61.57, 65.04, 72.69, 63.83,
67.555, 76.06, 68.61),
Y1 = c(493.81544, 493.81544, 494.54173,
494.61364, 494.61381, 494.38717, 494.64122, 493.73265, 494.04246,
494.92989, 494.98384),
Y2 = c(489.704166, 489.704166, 490.710962,
490.653212, 490.710612, 489.822928,
488.160904, 489.747776, 490.600579,
488.946738, 490.398958),
Y3 = c(19L, 19L, 19L, 23L, 30L,43L,43L,2L, 58L, 47L, 61L),
date = c("2013-06-01","2013-06-02","2013-06-03","2013-06-04",
"2013-06-05","2013-06-06","2013-06-07","2013-06-08",
"2013-06-09","2013-06-10","2013-06-11")),
.Names = c("X1", "X2", "Y1", "Y2", "Y3", "date"),
row.names = c(NA, 11L), class = "data.frame")
Ex <- arrange(Ex, Y3)
Ex$Dup <- as.numeric(duplicated(Y3))
Ex$Dup_rev <- as.numeric(duplicated(Y3,fromLast=TRUE))
##Testing If Else
attach(Ex)
Ex$X5 <- 0
for(i in 1:length(Y3))
{
if (Ex$Dup[i]==0 & Ex$Dup_rev[i]==0)
{
Ex$X5[i]=Y2[i]
} else if(Ex$Dup[i]==0)
{
Ex$X5[i]=Y2[i]
}else
{Ex$X5[i]=Y2[i] + X5[i-1]}
}
这样做的目的是,除非 Y3 列的值是它第一次出现在数据集中,否则对于 Y3 的每一行,我们都需要创建一个列 X5,它是之前 Y2 的累积和。由于我的数据量很大(大约 110k 行数据),因此这段代码需要花费大量时间来执行。有没有更简单的方法来执行相同的代码?
X1 X2 Y1 Y2 Y3 date Dup Dup_rev X5
1 -37.8304 63.830 493.7326 489.7478 2 2013-06-08 0 0 489.7478
2 -36.8598 64.260 493.8154 489.7042 19 2013-06-01 0 1 489.7042
3 -37.1726 63.085 493.8154 489.7042 19 2013-06-02 1 1 1469.1125
4 -36.4343 66.360 494.5417 490.7110 19 2013-06-03 1 0 1470.1193
5 -36.8644 61.080 494.6136 490.6532 23 2013-06-04 0 0 490.6532