python - 使用最少内存/cpu 对 3D 数据网格进行矩阵运算

Question

问题设置：我有一个大小为 n1,n2,n3=nx,ny,nz 的 3D（空间）数据网格，对于 a 或 b，可能 nz=1。该网格中的每个点具有每个网格点大小为 NDIM（通常 = 4）的数据向量 (a) 和每个网格点大小为 NDIMxNDIM 的另一个矩阵 (b)。我想为内存和 CPU 最有效地计算（每点）诸如 ab 或 ba 之类的东西。

本质上，我想在 python 中概括 A loopless 3D matrix multiplication。我似乎有一个有效的结果。但我不明白。搜索 google 和 stackoverflow 没有任何帮助。请解释并进一步概括！谢谢！

import numpy as np
# gives a.b per point:
nx=5
ny=8
nz=3
a = np.arange(nx*ny*nz*4).reshape(4, nx,ny,nz)
b = np.arange(nx*ny*1*4*4).reshape(4, 4, nx,ny,1)
ctrue=a*0.0
for ii in np.arange(0,nx):
    for jj in np.arange(0,ny):
        for kk in np.arange(0,nz):
           ctrue[:,ii,jj,kk] = np.tensordot(a[:,ii,jj,kk],b[:,:,ii,jj,0],axes=[0,1])

c2 = (a[:,None,None,None] * b[:,:,None,None,None]).sum(axis=1).reshape(4,nx,ny,nz)
np.sum(ctrue-c2)
# gives 0 as required


# gives b.a per point:
ctrue2=a*0.0
for ii in np.arange(0,nx):
    for jj in np.arange(0,ny):
        for kk in np.arange(0,nz):
                ctrue2[:,ii,jj,kk] = np.tensordot(a[:,ii,jj,kk],b[:,:,ii,jj,0],axes=[0,0])

btrans=np.transpose(b,(1,0,2,3,4))
c22 = (a[:,None,None,None] * btrans[:,:,None,None,None]).sum(axis=1).reshape(4,nx,ny,nz)
np.sum(ctrue2-c22)
# gives 0 as required

# Note that only the single line for c2 and c22 are required -- the rest of the code is for testing/comparison to see if that line works.

# Issues/Questions:
# 1) Please explain why those things work and further generalize!
# 2) After reading about None=np.newaxis, I thought something like this would work:
    c22alt = (a[:,None,:,:,:] * btrans[:,:]).sum(axis=1).reshape(4,nx,ny,nz)
    np.sum(ctrue2-c22alt)
# but it doesn't.
# 3) I don't see how to avoid assignment of a separate btrans.  An np.transpose on b[:,:,None,None,None] doesn't work.

其他相关链接： Numpy: Multiplying a matrix with a 3d tensor -- Suggestion How to use numpy with 'None' value in Python?

score 1 · Accepted Answer

首先，您的代码非常复杂。产品a.b和b.a可以简化为：

c2 = (a * b).sum(axis=1)
c22 = (a * b.swapaxes(0, 1)).sum(axis=1)

请注意，np.sum(ctrue - c2)您应该使用np.all(ctrue == c2);而不是如果这两种方法恰好给出相同和的结果，则前者可能会给出错误的结果！

为什么这行得通？考虑一个元素：

a0 = a[:, 0, 0, 0]
b0 = b[:, :, 0, 0, 0]

取张量点np.tensordot(a0, b0, axes=(0, 1))等价于(a0 * b0).sum(axis=1)。这是因为广播；的(4, )形状a0被广播到的(4, 4)形状，b0并且数组按元素相乘；1然后在轴上求和给出张量点。

对于其他点积，np.tensordot(a0, b0, axes=(0, 0))相当于(a0 * b0.T).sum(axis=1)where b0.Tis same as b0.transpose()which is the same as b0.swapaxes(0, 1)。通过转置b0，a0有效地对着另一个轴进行广播b0；我们可以通过得到相同的结果(a0[:, None] * b0).sum(axis=0)。

NumPy elementwise 操作的伟大之处在于，只要它们的形状对应或可以广播，您就可以完全忽略更高的轴，因此适用于a0并且b0（大部分）适用于a和b同样适用的方法。

最后，我们可以使用Einstein summation更清楚地说明这一点：

c2 = np.einsum('i...,ji...->j...', a, b)
c22 = np.einsum('i...,ij...->j...', a, b)

python - 使用最少内存/cpu 对 3D 数据网格进行矩阵运算

1 回答 1

Related

Reference