2

我有一个为我计算矩阵的函数,但它真的很慢。即使在 cython 中它运行缓慢,所以我想知道是否可以做任何事情来增强下面的代码。

编辑:我已更改或添加

des = np.zeros([n-m+1,m])to cdef np.ndarray des = np.zeros([n-m+1,m], dtype=DTYPE)(这比我没有np.empty...m/2我添加了 a快,cdef int m2 = m/2但这似乎没有任何帮助。

cimport numpy as np
cimport cython

DTYPE = float
ctypedef np.float_t DTYPE_t

@cython.boundscheck(False)
@cython.cdivision(True)
@cython.wraparound(False)
cpdef map4(np.ndarray[DTYPE_t, ndim=1] s, int m): 

  cdef int n = len(s)
  cdef int i
  cdef int j

  des = np.zeros([n-m+1,m])
  for j in xrange(m):
      for i in xrange(m/2,n-m/2-1):
          des[i-m/2,j] = s[i-j+m/2]

  return des, s, m, n

通常n~10000m=1001

4

3 回答 3

3

尝试:

cdef np.ndarray des = np.zeros([n-m+1,m])

您也可以像对参数 s 所做的那样使其更具体。您还可以关闭边界检查。查看cython numpy 教程

您可能还想创建一个变量:

cdef int m_2 = m/2

并在您拥有的任何地方使用它,m/2因为我不知道 Cython 是否会为您进行优化。

于 2013-03-12T18:43:35.650 回答
2

假设您将分配每个元素,它也可能有助于使用np.empty而不是np.zeros

des = np.empty([n-m+1,m])
于 2013-03-12T19:00:15.427 回答
0

I'm not seeing m being set anywhere. At the bottom of your code, you mention that n~10,000, and m=1001. Does that mean that m is a constant integer of 32 bits? Not seeing your compilation flags, it's frequently worthwhile to try it with and without -ffast-math to see if that makes a difference. With large arrays and matrices, using a smaller data type usually shows a significant speedup, provided that the smaller data type preserves the range and accuracy that your program needs, though I'm not seeing a large potential benefit on this calculation.

If you could show us the C code that is generated by this, that might help, as well.

于 2013-11-08T08:43:48.890 回答