4

I have a data frame where I want to add another column that's a result of computation involving 3 other columns. The method I am using right now seems to be very slow. Is there any better method to do the same. Here is the approach I am using.

library(bitops)

GetRes<-function(A, B, C){
  tagU <- bitShiftR((A*C), 4)
  tagV <- bitShiftR(B, 2)

  x<-tagU %% 2
  y<-tagV %% 4

  res<-(2*x + y) %% 4
  return(res)
}

df <- data.frame(id=letters[1:3],val0=1:3,val1=4:6,val2=7:9)
apply(df, 1, function(x) GetRes(x[2], x[3], x[4]))

My data frame is very big and it's taking ages to get this computation done. Can someone suggest me to do it better?

Thanks.

4

2 回答 2

7

尝试mapply

mapply(GetRes, df[,2], df[,3], df[,4])

如果您让我们知道bitShiftR 来自哪个包,我们可以在更大的数据上对其进行测试,看看是否有任何性能提升。

更新
快速基准测试显示,mapply速度是您的两倍apply

microbenchmark(apply(df[,2:4], 1, function(x) GetRes(x[1], x[2], x[3])), mapply(GetRes, df[,2], df[,3], df[,4]))
Unit: microseconds
                                                      expr     min       lq   median      uq      max neval
 apply(df[, 2:4], 1, function(x) GetRes(x[1], x[2], x[3])) 196.985 201.6200 206.7515 216.187 1006.775   100
                 mapply(GetRes, df[, 2], df[, 3], df[, 4])  99.982 105.6105 108.7560 112.232  149.311   100
于 2013-04-24T05:50:25.873 回答
3

您所做的一切都已经矢量化,这比您提供的任何其他替代方案都要快得多。你可以叫这个...

with(df, GetRes(val0, val1, val2))

或这个

GetRes(df$val0, df$val1, df$val2)

或这个

GetRes(df[,2], df[,3], df[,4])
于 2013-04-24T06:40:38.933 回答