python - numpy图像中像素+邻域的高效处理

Question

我有一个场景的范围图像。我遍历图像并计算检测窗口下的平均深度变化。检测窗口根据当前位置周围像素的平均深度改变大小。我累积平均变化以产生一个简单的响应图像。

大部分时间都花在 for 循环中，在我的机器上拍摄 512x52 图像大约需要 40 多秒。我希望能加快一些速度。是否有更有效/更快的方式来遍历图像？是否有更好的 pythonic/numpy/scipy 方式来访问每个像素？还是我应该去学习cython？

编辑：我通过使用 scipy.misc.imread() 而不是 skimage.io.imread() 将运行时间减少到大约 18 秒。不知道有什么区别，我会尝试调查。

这是代码的简化版本：

import matplotlib.pylab as plt
import numpy as np
from skimage.io import imread
from skimage.transform import integral_image, integrate
import time

def intersect(a, b):
    '''Determine the intersection of two rectangles'''
    rect = (0,0,0,0)
    r0 = max(a[0],b[0])
    c0 = max(a[1],b[1])
    r1 = min(a[2],b[2])
    c1 = min(a[3],b[3])
    # Do we have a valid intersection?
    if r1 > r0 and  c1 > c0: 
         rect = (r0,c0,r1,c1)
    return rect

# Setup data
depth_src = imread("test.jpg", as_grey=True)
depth_intg = integral_image(depth_src)   # integrate to find sum depth in region
depth_pts = integral_image(depth_src > 0)  # integrate to find num points which have depth
boundary = (0,0,depth_src.shape[0]-1,depth_src.shape[1]-1) # rectangle to intersect with

# Image to accumulate response
out_img = np.zeros(depth_src.shape)

# Average dimensions of bbox/detection window per unit length of depth
model = (0.602,2.044)  # width, height

start_time = time.time()
for (r,c), junk in np.ndenumerate(depth_src):
    # Find points around current pixel      
    r0, c0, r1, c1 = intersect((r-1, c-1, r+1, c+1), boundary)

    # Calculate average of depth of points around current pixel
    scale =  integrate(depth_intg, r0, c0, r1, c1) * 255 / 9.0 

    # Based on average depth, create the detection window
    r0 = r - (model[0] * scale/2)
    c0 = c - (model[1] * scale/2)
    r1 = r + (model[0] * scale/2)
    c1 = c + (model[1] * scale/2)

    # Used scale optimised detection window to extract features
    r0, c0, r1, c1 = intersect((r0,c0,r1,c1), boundary)
    depth_count = integrate(depth_pts,r0,c0,r1,c1)
    if depth_count:
         depth_sum = integrate(depth_intg,r0,c0,r1,c1)
         avg_change = depth_sum / depth_count
         # Accumulate response
         out_img[r0:r1,c0:c1] += avg_change
print time.time() - start_time, " seconds"

plt.imshow(out_img)
plt.gray()
plt.show()

score 3 · Accepted Answer

迈克尔，有趣的问题。您遇到的主要性能问题似乎是图像中的每个像素都有两个计算的积分（）函数，一个是 3x3 大小，另一个是事先不知道的大小。无论您使用什么 numpy 函数，以这种方式计算单个积分的效率都非常低；这是一个算法问题，而不是实现问题。考虑一个大小为 N N 的图像。您可以仅使用大约 4*N N 次操作来计算该图像中任何大小 K K 的所有积分，而不是（正如人们可能天真地期望的那样）N N KK. 你这样做的方法是首先在每行的窗口 K 上计算滑动总和的图像，然后在每列的结果上滑动总和。更新每个滑动和以移动到下一个像素只需要添加当前窗口中的最新像素并减去前一个窗口中最旧的像素，因此无论窗口大小如何，每个像素都进行两次操作。我们必须这样做两次（对于行和列），因此每个像素需要 4 次操作。

我不确定 numpy 中是否内置了滑动窗口总和，但这个答案提出了几种方法，使用步幅技巧：https ://stackoverflow.com/a/12713297/1828289 。您当然可以通过一个循环在列上和一个循环在行上完成相同的操作（使用切片来提取行/列）。

例子：

# img is a 2D ndarray
# K is the size of sums to calculate using sliding window
row_sums = numpy.zeros_like(img)
for i in range( img.shape[0] ):
    if i > K:
        row_sums[i,:] = row_sums[i-1,:] - img[i-K-1,:] + img[i,:]
    elif i > 1:
        row_sums[i,:] = row_sums[i-1,:] + img[i,:]
    else: # i == 0
        row_sums[i,:] = img[i,:]

col_sums = numpy.zeros_like(img)
for j in range( img.shape[1] ):
    if j > K:
        col_sums[:,j] = col_sums[:,j-1] - row_sums[:,j-K-1] + row_sums[:,j]
    elif j > 1:
        col_sums[:,j] = col_sums[:,j-1] + row_sums[:,j]
    else: # j == 0
        col_sums[:,j] = row_sums[:,j]

# here col_sums[i,j] should be equal to numpy.sum(img[i-K:i, j-K:j]) if i >=K and j >= K
# first K rows and columns in col_sums contain partial sums and can be ignored

您如何最好地将其应用到您的案例中？我认为您可能想要预先计算 3x3（平均深度）和几个较大尺寸的积分，并使用 3x3 的值来选择检测窗口的较大尺寸之一（假设我了解您的意图算法）。您需要的较大尺寸的范围可能会受到限制，或者人为限制它可能仍然可以正常工作，只需选择最接近的尺寸即可。使用滑动总和一起计算所有积分的效率要高得多，我几乎可以肯定，对于很多你永远不会在特定像素上使用的尺寸计算它们是值得的，特别是如果某些尺寸很大。

P.S. This is a minor addition, but you may want to avoid calling intersect() for every pixel: either (a) only process pixels which are farther from the edge than the max integral size, or (b) add margins to the image of the max integral size on all sides, filling the margins with either zeros or nans, or (c) (best approach) use slices to take care of this automatically: a slice index outside the boundary of an ndarray is automatically limited to the boundary, except of course negative indexes are wrapped around.

EDIT: added example of sliding window sums

python - numpy图像中像素+邻域的高效处理

1 回答 1

Related

Reference