python - 使用 Cython 优化 NumPy

Question

我目前正在尝试优化我用纯 Python 编写的代码。当我使用 NumPy 数组时，此代码非常频繁地使用NumPy 。下面你可以看到我转换为Cython的最简单的类。它只做两个 Numpy 数组的乘法。这里：

bendingForces = self.matrixPrefactor * membraneHeight

我的问题是，如果以及如何优化它，当我查看“cython -a”生成的 C 代码时，有很多 NumPy 调用，这看起来效率不高。

import numpy as np
cimport numpy as np
ctypedef np.float64_t dtype_t
ctypedef np.complex128_t cplxtype_t
ctypedef Py_ssize_t index_t

    cdef class bendingForcesClass( object ):
        cdef dtype_t bendingRigidity
        cdef np.ndarray matrixPrefactor
        cdef np.ndarray bendingForces

        def __init__( self, dtype_t bendingRigidity, np.ndarray[dtype_t, ndim=2] waveNumbersNorm ):
            self.bendingRigidity = bendingRigidity
            self.matrixPrefactor = -self.bendingRigidity * waveNumbersNorm**2

        cpdef np.ndarray calculate( self, np.ndarray membraneHeight ) :
            cdef np.ndarray bendingForces
            bendingForces = self.matrixPrefactor * membraneHeight
            return bendingForces

我的想法是使用两个for循环并遍历数组的条目。也许我可以使用编译器通过 SIMD 操作来优化它？！我试过了，我可以编译它，但它给出了奇怪的结果并且花了很长时间。下面是替换函数的代码：

cpdef np.ndarray calculate( self, np.ndarray membraneHeight ) :

    cdef index_t index1, index2 # corresponds to: cdef Py_ssize_t index1, index2
    for index1 in range( self.matrixSize ):
        for index2 in range( self.matrixSize ):
            self.bendingForces[ index1, index2 ] = self.matrixPrefactor.data[ index1, index2 ] * membraneHeight.data[ index1, index2 ]
    return self.bendingForces

然而，正如我所说，这段代码真的很慢，并且不能按预期运行。那么我做错了什么？优化这一点并删除 NumPy 调用操作的最佳方法是什么？

score 9 · Accepted Answer

对于简单的矩阵乘法，NumPy 代码已经在本地只进行循环和乘法，因此在 Cython 中很难击败它。Cython 非常适合将 Python 中的循环替换为 Cython 中的循环的情况。您的代码比 NumPy 慢的原因之一是每次您在数组中进行索引查找时，

self.bendingForces[ index1, index2 ] = self.matrixPrefactor.data[ index1, index2 ] * membraneHeight.data[ index1, index2 ]

它会进行更多计算，例如边界检查（索引有效）。如果将索引转换为无符号整数，则可以@cython.boundscheck(False)在函数之前使用装饰器。

有关加速 Cython 代码的更多详细信息，请参阅本教程。

score 0 · Accepted Answer

您可能可以通过使用来加快速度

for index1 from 0 <= index1 < max1:

而不是使用我不确定是否输入的范围。

你检查过这个和这个吗？

python - 使用 Cython 优化 NumPy

2 回答 2

Related

Reference