4

我有两个 3D 张量,张量A有 shape[B,N,S]和张量B也有 shape [B,N,S]。我想要得到的是第三个张量C,我希望它有[B,B,N]形状,其中元素C[i,j,k] = np.dot(A[i,k,:], B[j,k,:]. 我也想实现这是一种矢量化的方式。

一些进一步的信息:这两个张量AB有 shape [Batch_size, Num_vectors, Vector_size]。张量C, 应该表示批次 from 中的A每个元素与批次 from 中的每个元素之间的点积B,以及所有不同向量之间的点积。

希望它足够清楚,期待您的回答!

4

3 回答 3

4
In [331]: A=np.random.rand(100,200,300)                                                              
In [332]: B=A

The suggested einsum, working directly from the

C[i,j,k] = np.dot(A[i,k,:], B[j,k,:] 

expression:

In [333]: np.einsum( 'ikm, jkm-> ijk', A, B).shape                                                   
Out[333]: (100, 100, 200)
In [334]: timeit np.einsum( 'ikm, jkm-> ijk', A, B).shape                                            
800 ms ± 25.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

matmul does a dot on the last 2 dimensions, and treats the leading one(s) as batch. In your case 'k' is the batch dimension, and 'm' is the one that should obey the last A and 2nd to the last of B rule. So rewriting the ikm,jkm... to fit, and transposing A and B accordingly:

In [335]: np.einsum('kim,kmj->kij', A.transpose(1,0,2), B.transpose(1,2,0)).shape                     
Out[335]: (200, 100, 100)
In [336]: timeit np.einsum('kim,kmj->kij',A.transpose(1,0,2), B.transpose(1,2,0)).shape              
774 ms ± 22.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Not much difference in performance. But now use matmul:

In [337]: (A.transpose(1,0,2)@B.transpose(1,2,0)).transpose(1,2,0).shape                             
Out[337]: (100, 100, 200)
In [338]: timeit (A.transpose(1,0,2)@B.transpose(1,2,0)).transpose(1,2,0).shape                      
64.4 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

and verify that values match (though more often than not, if shapes match, values do to).

In [339]: np.allclose((A.transpose(1,0,2)@B.transpose(1,2,0)).transpose(1,2,0),np.einsum( 'ikm, jkm->
     ...:  ijk', A, B))                                                                              
Out[339]: True

I won't try to measure memory usage, but the time improvement suggests it too is better.

In some cases einsum is optimized to use matmul. Here that doesn't seem to be the case, though we could play with its parameters. I'm a little surprised the matmul is doing so much better.

===

I vaguely recall another SO about matmul taking a short cut when the two arrays are the same thing, A@A. I used B=A in these tests.

In [350]: timeit (A.transpose(1,0,2)@B.transpose(1,2,0)).transpose(1,2,0).shape                      
60.6 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [352]: B2=np.random.rand(100,200,300)                                                             
In [353]: timeit (A.transpose(1,0,2)@B2.transpose(1,2,0)).transpose(1,2,0).shape                     
97.4 ms ± 164 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

But that only made a modest difference.

In [356]: np.__version__                                                                             
Out[356]: '1.16.4'

My BLAS etc is standard Linux, nothing special.

于 2019-06-26T23:30:35.220 回答
2

I think you can use einsum such as:

np.einsum( 'ikm, jkm-> ijk', A, B)

with the subscripts 'ikm, jkm-> ijk', you can specify which dimension are reduced with the Einstein convention. The third dimension of both arrays A and B here named 'm' will be reduced as the dot operation does on vectors.

于 2019-06-26T13:53:38.793 回答
-1

尝试:

C = np.diagonal( np.tensordot(A,B, axes=(2,2)), axis1=1, axis2=3)

来自https://docs.scipy.org/doc/numpy/reference/generated/numpy.tensordot.html#numpy.tensordot

解释

解决方案是两个操作的组合。首先是 A 和 B 在其第三轴上的张量积,如您所愿。这会输出一个 rank-4 张量,您希望通过在轴 1 和 3 上采用相等的索引来减少到 rank-3 张量(k在您的符号中,请注意,它tensordot给出的轴顺序与您的数学不同)。这可以通过取对角线来完成,就像在将矩阵简化为其对角线条目的向量时所做的那样。

于 2019-06-26T13:44:42.820 回答