编辑:从 NumPy 1.9 开始,看起来 inner1d 可能更快。(感谢 Nuno Aniceto 指出这一点):
In [9]: %timeit -n 1000000 inner1d(d,d)
1000000 loops, best of 3: 1.39 µs per loop
In [14]: %timeit -n 1000000 einsum('ij,ij -> i', d, d)
1000000 loops, best of 3: 1.8 µs per loop
PS。始终在与预期用例相似的输入上为自己测试基准。结果可能因多种原因而有所不同,例如输入大小、硬件、操作系统、Python 版本、NumPy 版本、编译器和库(例如 ATLAS、MKL、BLAS)。
如果你有 NumPy 1.6 或更高版本,你可以使用np.einsum:
In [40]: %timeit np.einsum('ij,ij -> i', d, d)
1000000 loops, best of 3: 1.79 us per loop
In [46]: from numpy.core.umath_tests import inner1d
In [48]: %timeit inner1d(d, d)
100000 loops, best of 3: 1.97 us per loop
In [44]: %timeit np.sum(d*d, axis=1)
100000 loops, best of 3: 5.39 us per loop
In [41]: %timeit np.diag(np.dot(d,d.T))
100000 loops, best of 3: 7.2 us per loop
In [42]: %timeit np.array([np.dot(d[i,:],d[i,:]) for i in range(d.shape[0])])
10000 loops, best of 3: 26.1 us per loop