1

我有 X (nxd)、Y (mxd) 和正定 L (dxd)。我想计算 D,其中 D_ij 是 (X_i - Y_i) * L * (X_i - Y_i).T。n 和 m 在 250 左右;d 约为 10^4。

我可以使用scipy.spatial.distance.cdist,但这很慢。

scipy.spatial.distance.cdist(X, Y, metric='mahalanobis', VI=L)

看着 Dougal 对这个问题的回答,我试过了

    diff = X[np.newaxis, :, :] - Y[:, np.newaxis, :]
    D = np.einsum('jik,kl,jil->ij', diff, L, diff)

这也很慢。

有没有更有效的方法来向量化这个计算?

4

1 回答 1

1

np.tensordot在这样的情况下使用和np.einsum帮助的组合-

np.einsum('jil,jil->ij',np.tensordot(diff, L, axes=(2,0)), diff)

运行时测试 -

In [26]: n,m,d = 30,40,50
    ...: X = np.random.rand(n,d)
    ...: L = np.random.rand(d,d)
    ...: Y = np.random.rand(m,d)
    ...: 

In [27]: diff = X[np.newaxis, :, :] - Y[:, np.newaxis, :]

In [28]: %timeit np.einsum('jik,kl,jil->ij', diff, L, diff)
100 loops, best of 3: 7.81 ms per loop

In [29]: %timeit np.einsum('jil,jil->ij',np.tensordot(diff, L, axes=(2,0)), diff)
1000 loops, best of 3: 472 µs per loop
于 2017-01-26T07:02:38.657 回答