17

If I have numpy arrays A and B, then I can compute the trace of their matrix product with:

tr = numpy.linalg.trace(A.dot(B))

However, the matrix multiplication A.dot(B) unnecessarily computes all of the off-diagonal entries in the matrix product, when only the diagonal elements are used in the trace. Instead, I could do something like:

tr = 0.0
for i in range(n):
    tr += A[i, :].dot(B[:, i])

but this performs the loop in Python code and isn't as obvious as numpy.linalg.trace.

Is there a better way to compute the trace of a matrix product of numpy arrays? What is the fastest or most idiomatic way to do this?

4

3 回答 3

15

您可以通过仅减少对角元素的中间存储来改进@Bill 的解决方案:

from numpy.core.umath_tests import inner1d

m, n = 1000, 500

a = np.random.rand(m, n)
b = np.random.rand(n, m)

# They all should give the same result
print np.trace(a.dot(b))
print np.sum(a*b.T)
print np.sum(inner1d(a, b.T))

%timeit np.trace(a.dot(b))
10 loops, best of 3: 34.7 ms per loop

%timeit np.sum(a*b.T)
100 loops, best of 3: 4.85 ms per loop

%timeit np.sum(inner1d(a, b.T))
1000 loops, best of 3: 1.83 ms per loop

另一种选择是使用np.einsum并且根本没有显式的中间存储:

# Will print the same as the others:
print np.einsum('ij,ji->', a, b)

在我的系统上,它的运行速度比 using 稍慢inner1d,但它可能不适用于所有系统,请参阅此问题

%timeit np.einsum('ij,ji->', a, b)
100 loops, best of 3: 1.91 ms per loop
于 2013-09-17T16:39:24.763 回答
10

From wikipedia you can calculate the trace using the hadamard product (element-wise multiplication):

# Tr(A.B)
tr = (A*B.T).sum()

I think this takes less computation than doing numpy.trace(A.dot(B)).

Edit:

Ran some timers. This way is much faster than using numpy.trace.

In [37]: timeit("np.trace(A.dot(B))", setup="""import numpy as np;
A, B = np.random.rand(1000,1000), np.random.rand(1000,1000)""", number=100)
Out[38]: 8.6434469223022461

In [39]: timeit("(A*B.T).sum()", setup="""import numpy as np;
A, B = np.random.rand(1000,1000), np.random.rand(1000,1000)""", number=100)
Out[40]: 0.5516049861907959
于 2013-09-17T16:12:13.400 回答
0

请注意,一种轻微的变体是采用vectorized 矩阵的点积。在 python 中,向量化是使用.flatten('F'). 它比在我的计算机上计算 Hadamard 乘积的总和稍慢,因此它比 wflynny 的解决方案更差,但我认为这很有趣,因为在我看来,在某些情况下它可能更直观。例如,我个人发现对于矩阵正态分布,矢量化解更容易理解。

速度比较,在我的系统上:

import numpy as np
import time

N = 1000

np.random.seed(123)
A = np.random.randn(N, N)
B = np.random.randn(N, N)

tart = time.time()
for i in range(10):
    C = np.trace(A.dot(B))
print(time.time() - start, C)

start = time.time()
for i in range(10):
    C = A.flatten('F').dot(B.T.flatten('F'))
print(time.time() - start, C)

start = time.time()
for i in range(10):
    C = (A.T * B).sum()
print(time.time() - start, C)

start = time.time()
for i in range(10):
    C = (A * B.T).sum()
print(time.time() - start, C)

结果:

6.246593236923218 -629.370798672
0.06539678573608398 -629.370798672
0.057890892028808594 -629.370798672
0.05709719657897949 -629.370798672
于 2017-02-22T15:18:57.843 回答