0

I am trying to calculate the conditional standard deviation of a matrix B (for every column) based on the values of matrix A.

#conditional function
foo<-function(x,y)
{
  out<-sd(y[abs(x)==1])
  return(out)
}

#create the matrix
A<-matrix(data=c(1,-1,0,1,0,0,0,0,1,1),nrow=5,ncol=2)
B<-matrix(data=c(3,4,5,6,7,8,9,10,11,12),nrow=5,ncol=2)

#run for the first column
foo(A[,1],B[,1])

#run for both columns
apply(X=A, MARGIN=2, FUN=function(x,y) foo(x,y), y=B)

the correct answer is 1.53 and 0.707 which I get when i run directly the foo individually for every column.

However, when i try to run both columns with apply I get this result 3.06 2.94.

Any idea how to change the apply in order to make it work cause I have a large matrix of assets (in xts object). Currently, I am using a for loop but I am sure it can be done with a more efficient way.

Thank you in advance,

Nikos

4

1 回答 1

4

您的方法的问题是您试图将矩阵 ( B) 传递给您的函数foo,该函数需要两个向量 (xy)。

你可以尝试这样的事情:

sapply(1:ncol(A), function(i) sd(B[as.logical(abs(A[,i])),i]))

[1] 1.5275252 0.7071068

这基本上只是一个循环......

另一种方法是,如果您的AB对象是数据框,您可以使用mapply

A <- as.data.frame(A)
B <- as.data.frame(B)
mapply(foo, A,B)

       V1        V2 
1.5275252 0.7071068 

以这两种方法为基准,这sapply条路线可能快两倍。我可以想象这是因为sapply只是将整数向量作为参数并处理矩阵,而该mapply方法将数据帧作为参数(数据帧比矩阵慢,并且传递循环的信息比单个索引值更多)。细节:

Unit: microseconds
                                                             expr     min      lq  median       uq      max neval
 sapply(1:ncol(A), function(i) sd(B[as.logical(abs(A[, i])), i])) 101.997 110.080 113.929 118.5480 1515.319  1000
                                              mapply(foo, A2, B2) 191.292 200.529 207.073 215.1555 1707.380  1000
于 2013-08-14T15:31:02.227 回答