This actually doesn't require any matrix multiplication at all. It only requires taking outer products, and the final result can be put together efficiently and concisely exploiting the column-major layout of R matrices and multidimensional arrays. The computation can be made more efficient by expanding out the product in the loop into individual summands. This leads to the following implementation.
vecout<-function(...)as.vector(outer(...))
f2<-function(a,b,c,A,B,C){
N<-length(a)
t<-length(A)
ab<-vecout(a,b)
ret<-array(vecout(ab,c),c(N,N,N))*t
ret<-ret - ab * sum(C)
ret<-ret - vecout(a,rep(c,each = N)) * sum(B)
ret<-ret - rep(vecout(b,c) * sum(A),each=N)
ret<-ret + a * sum(B*C)
ret<-ret + rep(b * sum(A*C),each=N)
ret<-ret + rep(c * sum(A*B),each=N^2)
ret<-ret - sum(A*B*C)
ret
}
We can compare the running time and check the correctness as follows. Here is the naive implementation:
f1<-function(a,b,c,A,B,C){
N<-length(a)
ret<-array(0,c(N,N,N))
for(i in 1:N)
for(j in 1:N)
for(k in 1:N)
ret[i,j,k]<-sum((a[i]-A)*(b[j]-B)*(c[k]-C))
ret
}
Te optimized version runs substantially faster and produces the same result, up to numerical precision error:
> a<-rnorm(100)
> b<-rnorm(100)
> c<-rnorm(100)
> A<-rnorm(200)
> B<-rnorm(200)
> C<-rnorm(200)
> system.time(r1<-f1(a,b,c,A,B,C))
user system elapsed
9.006 1.125 10.204
> system.time(r2<-f2(a,b,c,A,B,C))
user system elapsed
0.203 0.033 0.242
> max(abs(r1-r2))
[1] 1.364242e-12
If you have more than three sequences each, the same idea will work. It shouldn't be too hard to code up the general solution, in fact, it might well be possible to write the general solution with less lines of code than the hard-coded 3 sequence solution, although it will take a little thought to get all of the index manipulations just right.
On edit: OK, couldn't resist. Here's a general solution, with an arbitrary number of pairs, passed as the columns of two matrices:
f3<-function(a,A){
subsets<-as.matrix(expand.grid(rep(list(c(F,T)),ncol(a))))
ret<-array(0,rep(nrow(a),ncol(a)))
for(i in 1:nrow(subsets)){
sub<-as.logical(subsets[i,])
temp<-Reduce(outer,as.list(data.frame(a[,sub,drop=F])),init=1)
temp<-temp*sum(apply(A[,!sub,drop=F],1,prod))
temp<-aperm(array(temp,dim(ret)),order(c(which(sub),which(!sub))))
ret<-ret+temp*(-1)^(sum(!sub))
}
ret
}
> system.time(r3<-f3(cbind(a,b,c),cbind(A,B,C)))
user system elapsed
0.258 0.056 0.303
> max(abs(r3-r1))
[1] 9.094947e-13
--------------------- Edit again (Nov 8 2013) ---------------------
The answer give above is the most efficient when the arrays A
, B
, and C
are large. If A
, B
, and C
are much smaller than a
, b
, and c
, then here is an alternate, more concise solution:
f4<-function(a,A){
ret<-array(0,rep(nrow(a),ncol(a)))
for(i in 1:nrow(A)){
temp<- Reduce(outer,as.list(data.frame(a-rep(A[i,],each=nrow(a)))),init=1)
ret<-ret + as.vector(temp )
}
ret
}
In the above setup, where a
, b
, c
, have length 100
and A
, B
, C
have length 200
, this is slower than the other solution:
> system.time(r3<-f3(cbind(a,b,c),cbind(A,B,C)))
user system elapsed
0.704 0.092 0.256
> system.time(r4<-f4(cbind(a,b,c),cbind(A,B,C)))
user system elapsed
65.824 19.060 3.553
> max(abs(r3-r4))
[1] 2.728484e-12
However, if A
, B
, and C
have length 1
, then it is much faster:
> A<-rnorm(1)
> B<-rnorm(1)
> C<-rnorm(1)
> system.time(r3<-f3(cbind(a,b,c),cbind(A,B,C)))
user system elapsed
0.796 0.172 0.222
> system.time(r4<-f4(cbind(a,b,c),cbind(A,B,C)))
user system elapsed
0.180 0.012 0.017
> max(abs(r3-r4))
[1] 7.105427e-15