arrays - 没有循环的动态矩阵乘法

Question

我遇到了以下问题：

假设我有三对序列：

{a_1, ..., a_N}, {A_1, ..., A_T}, {b_1, ..., b_N}, {B_1, ..., B_T}, {c_1, ..., c_N}, {C_1, ..., C_T}.

我的目标是执行以下操作（不循环！）：

for (i in 1:N) {
  for (j in 1:N) {
    for (k in 1:N) {
      ret[i,j,k] <- \sum_{t=1}^T (a_i - A_t) * (b_j - B_t) * (c_k - C_t)
}}}

我不想循环的原因是可能有比这三个更多的序列对。我想尽可能“高效和灵活”地构建代码。

如果我们只有两对序列，那么问题就很简单了，因为它简化为简单的矩阵乘法（一个(N x T)矩阵(a_i - A_t)和一个(N x T)矩阵，用于(b_i - B_t)将第一个与第二个的转置相乘）。

但是一旦你有超过两对序列，我不确定它是否可以在没有循环的情况下完成，因为的维度output取决于序列对的数量......

------------------------------------------ 相关问题（2013 年 11 月 8 日） ) ------------------------------------------

感谢@-mrip ，我成功实现了第一部分。但是，如果我想要以下内容，必须如何更改代码：

for (i in 1:N) {
  for (j in 1:N) {
    for (k in 1:N) {
      ret[i,j,k] <- \sum_{t=1}^T foo(a_i, a_i - A_t) * foo(b_j, b_j - B_t) * foo(c_k, c_k - C_t)
}}}

哪里foo(a, a-A)是一些常见的二元函数。是否有“通用”解决方案，或者您是否需要有关结构的更多信息foo(a, a-A)？

我尝试使用简单的解决方案并简单地实现循环。当然，这既不灵活（因为我必须事先将自己限制在可能的对/维度数量）也不快速（因为两者a都A可以很大 - 尽管可能a只是一个标量和A一些观察结果）。

我知道我可能要求很高。但是很长一段时间以来，我都完全陷入了这个问题......因此，非常欢迎任何帮助。

score 4 · Accepted Answer

This actually doesn't require any matrix multiplication at all. It only requires taking outer products, and the final result can be put together efficiently and concisely exploiting the column-major layout of R matrices and multidimensional arrays. The computation can be made more efficient by expanding out the product in the loop into individual summands. This leads to the following implementation.

 vecout<-function(...)as.vector(outer(...))

 f2<-function(a,b,c,A,B,C){
 N<-length(a)
 t<-length(A)

 ab<-vecout(a,b)
 ret<-array(vecout(ab,c),c(N,N,N))*t

 ret<-ret - ab * sum(C)
 ret<-ret - vecout(a,rep(c,each = N)) * sum(B)
 ret<-ret - rep(vecout(b,c) * sum(A),each=N)
 ret<-ret + a * sum(B*C)
 ret<-ret + rep(b * sum(A*C),each=N)
 ret<-ret + rep(c * sum(A*B),each=N^2)
 ret<-ret - sum(A*B*C)
 ret
 }

We can compare the running time and check the correctness as follows. Here is the naive implementation:

f1<-function(a,b,c,A,B,C){
 N<-length(a)
 ret<-array(0,c(N,N,N))
 for(i in 1:N)
  for(j in 1:N)
   for(k in 1:N)
    ret[i,j,k]<-sum((a[i]-A)*(b[j]-B)*(c[k]-C))
 ret
 }

Te optimized version runs substantially faster and produces the same result, up to numerical precision error:

> a<-rnorm(100)
> b<-rnorm(100)
> c<-rnorm(100)
> A<-rnorm(200)
> B<-rnorm(200)
> C<-rnorm(200)
> system.time(r1<-f1(a,b,c,A,B,C))
   user  system elapsed 
  9.006   1.125  10.204 
> system.time(r2<-f2(a,b,c,A,B,C))
   user  system elapsed 
  0.203   0.033   0.242 
> max(abs(r1-r2))
[1] 1.364242e-12

If you have more than three sequences each, the same idea will work. It shouldn't be too hard to code up the general solution, in fact, it might well be possible to write the general solution with less lines of code than the hard-coded 3 sequence solution, although it will take a little thought to get all of the index manipulations just right.

On edit: OK, couldn't resist. Here's a general solution, with an arbitrary number of pairs, passed as the columns of two matrices:

f3<-function(a,A){
  subsets<-as.matrix(expand.grid(rep(list(c(F,T)),ncol(a))))
  ret<-array(0,rep(nrow(a),ncol(a)))

  for(i in 1:nrow(subsets)){
    sub<-as.logical(subsets[i,])
    temp<-Reduce(outer,as.list(data.frame(a[,sub,drop=F])),init=1)
    temp<-temp*sum(apply(A[,!sub,drop=F],1,prod))
    temp<-aperm(array(temp,dim(ret)),order(c(which(sub),which(!sub))))
    ret<-ret+temp*(-1)^(sum(!sub))
  } 
  ret
}

> system.time(r3<-f3(cbind(a,b,c),cbind(A,B,C)))
   user  system elapsed 
  0.258   0.056   0.303 
> max(abs(r3-r1))
[1] 9.094947e-13

--------------------- Edit again (Nov 8 2013) ---------------------

The answer give above is the most efficient when the arrays A, B, and C are large. If A, B, and C are much smaller than a, b, and c, then here is an alternate, more concise solution:

f4<-function(a,A){
  ret<-array(0,rep(nrow(a),ncol(a)))
  for(i in 1:nrow(A)){
    temp<- Reduce(outer,as.list(data.frame(a-rep(A[i,],each=nrow(a)))),init=1)
    ret<-ret + as.vector(temp )
  }
  ret
}

In the above setup, where a, b, c, have length 100 and A, B, C have length 200, this is slower than the other solution:

> system.time(r3<-f3(cbind(a,b,c),cbind(A,B,C)))
   user  system elapsed 
  0.704   0.092   0.256 
> system.time(r4<-f4(cbind(a,b,c),cbind(A,B,C)))
   user  system elapsed 
 65.824  19.060   3.553 
> max(abs(r3-r4))
[1] 2.728484e-12

However, if A, B, and C have length 1, then it is much faster:

> A<-rnorm(1)
> B<-rnorm(1)
> C<-rnorm(1)
> system.time(r3<-f3(cbind(a,b,c),cbind(A,B,C)))
   user  system elapsed 
  0.796   0.172   0.222 
> system.time(r4<-f4(cbind(a,b,c),cbind(A,B,C)))
   user  system elapsed 
  0.180   0.012   0.017 
> max(abs(r3-r4))
[1] 7.105427e-15

arrays - 没有循环的动态矩阵乘法

1 回答 1

Related

Reference