python - 对于 Numpy 的循环速度

Question

我试图让这段代码在 python 中快速运行，但是我无法让它在任何接近它在 MATLAB 中运行的速度运行。问题似乎是这个 for 循环，当“SRpixels”的数量大约等于 25000 时，它需要大约 2 秒才能运行。

我似乎找不到任何方法来进一步削减它，我正在寻找建议。

下面的 numpy 数组的数据类型是 float32，除了 **_Location[] 是 uint32。

for j in range (0,SRpixels):
    #Skip data if outside valid range
    if (abs(SR_pointCloud[j,0]) > SR_xMax or SR_pointCloud[j,2] > SR_zMax or SR_pointCloud[j,2] < 0):
        pass
    else:           
        RIGrid1_Location[j,0] = np.floor(((SR_pointCloud[j,0] + xPosition + 5) - xGrid1Center) / gridSize)
        RIGrid1_Location[j,1] = np.floor(((SR_pointCloud[j,2] + yPosition) - yGrid1LowerBound) / gridSize)

        RIGrid1_Count[RIGrid1_Location[j,0],RIGrid1_Location[j,1]] += 1
        RIGrid1_Sum[RIGrid1_Location[j,0],RIGrid1_Location[j,1]] += SR_pointCloud[j,1]
        RIGrid1_SumofSquares[RIGrid1_Location[j,0],RIGrid1_Location[j,1]] += SR_pointCloud[j,1] * SR_pointCloud[j,1]

        RIGrid2_Location[j,0] = np.floor(((SR_pointCloud[j,0] + xPosition + 5) - xGrid2Center) / gridSize)
        RIGrid2_Location[j,1] = np.floor(((SR_pointCloud[j,2] + yPosition) - yGrid2LowerBound) / gridSize)

        RIGrid2_Count[RIGrid2_Location[j,0],RIGrid2_Location[j,1]] += 1 
        RIGrid2_Sum[RIGrid2_Location[j,0],RIGrid2_Location[j,1]] += SR_pointCloud[j,1]
        RIGrid2_SumofSquares[RIGrid2_Location[j,0],RIGrid2_Location[j,1]] += SR_pointCloud[j,1] * SR_pointCloud[j,1]

我确实尝试使用 Cython，将 j 替换为 acdef int j并编译。没有明显的性能提升。有人有建议吗？

score 5 · Accepted Answer

矢量化几乎总是加速 numpy 代码的最佳方式，其中大部分似乎是可矢量化的。例如，首先，位置数组看起来很简单：

# these are all of your j values
inds = np.arange(0,SRpixels)

# these are the j values you don't want to skip
sel = np.invert((abs(SR_pointCloud[inds,0]) > SR_xMax) | (SR_pointCloud[inds,2] > SR_zMax) | (SR_pointCloud[inds,2] < 0))

RIGrid1_Location[sel,0] = np.floor(((SR_pointCloud[sel,0] + xPosition + 5) - xGrid1Center) / gridSize)
RIGrid1_Location[sel,1] = np.floor(((SR_pointCloud[sel,2] + yPosition) - yGrid1LowerBound) / gridSize)
RIGrid2_Location[sel,0] = np.floor(((SR_pointCloud[sel,0] + xPosition + 5) - xGrid2Center) / gridSize)
RIGrid2_Location[sel,1] = np.floor(((SR_pointCloud[sel,2] + yPosition) - yGrid2LowerBound) / gridSize)

这没有python循环。

其余的比较棘手，取决于你在做什么，但如果你以这种方式考虑它们，它们也应该是可矢量化的。

如果你真的有一些东西不能被矢量化并且必须用循环来完成——我只发生过几次——我建议 Weave over Cython。它更难使用，但应该提供与 C 相当的速度。

score 1 · Accepted Answer

先尝试向量化计算，如果必须逐个元素进行计算，这里有一些加速提示：

使用 NumPy 标量进行计算比内置标量慢得多。array[i, j] 将获得一个 numpy 标量，而 array.item(i,j) 将返回一个内置标量。
在进行标量计算时，数学模块中的函数比 numpy 快。

这是一个例子：

import numpy as np
import math
a = np.array([[1.1, 2.2, 3.3],[4.4, 5.5, 6.6]])
%timeit np.floor(a[0,0]*2)
%timeit math.floor(a[0,0]*2)
%timeit np.floor(a.item(0,0)*2)
%timeit math.floor(a.item(0,0)*2)

输出：

100000 loops, best of 3: 10.2 µs per loop
100000 loops, best of 3: 3.49 µs per loop
100000 loops, best of 3: 6.49 µs per loop
1000000 loops, best of 3: 851 ns per loop

所以更改np.floor为math.floor，更改SR_pointCloud[j,0]为SR_pointCloud.item(j,0)将大大加快循环。

python - 对于 Numpy 的循环速度

2 回答 2

Related

Reference