python - python中的快速寻峰和质心

Question

我正在尝试在 python 中开发一种快速算法来查找图像中的峰值，然后找到这些峰值的质心。我使用 scipy.ndimage.label 和 ndimage.find_objects 编写了以下代码来定位对象。这似乎是代码中的瓶颈，在 500x500 图像中定位 20 个对象大约需要 7 毫秒。我想把它放大到更大的（2000x2000）图像，但是时间增加到几乎 100 毫秒。所以，我想知道是否有更快的选择。

这是我到目前为止的代码，它有效，但速度很慢。首先，我使用一些高斯峰值模拟我的数据。这部分速度很慢，但实际上我将使用真实数据，所以我不太关心加快这部分的速度。我希望能够很快找到山峰。

import time
import numpy as np
import matplotlib.pyplot as plt
import scipy.ndimage
import matplotlib.patches 

plt.figure(figsize=(10,10))
ax1 = plt.subplot(221)
ax2 = plt.subplot(222)
ax3 = plt.subplot(223)
ax4 = plt.subplot(224)

size        = 500 #width and height of image in pixels
peak_height = 100 # define the height of the peaks
num_peaks   = 20
noise_level = 50
threshold   = 60

np.random.seed(3)

#set up a simple, blank image (Z)
x = np.linspace(0,size,size)
y = np.linspace(0,size,size)

X,Y = np.meshgrid(x,y)
Z = X*0

#now add some peaks
def gaussian(X,Y,xo,yo,amp=100,sigmax=4,sigmay=4):
    return amp*np.exp(-(X-xo)**2/(2*sigmax**2) - (Y-yo)**2/(2*sigmay**2))

for xo,yo in size*np.random.rand(num_peaks,2):
    widthx = 5 + np.random.randn(1)
    widthy = 5 + np.random.randn(1)
    Z += gaussian(X,Y,xo,yo,amp=peak_height,sigmax=widthx,sigmay=widthy)

#of course, add some noise:
Z = Z + scipy.ndimage.gaussian_filter(0.5*noise_level*np.random.rand(size,size),sigma=5)    
Z = Z + scipy.ndimage.gaussian_filter(0.5*noise_level*np.random.rand(size,size),sigma=1)    

t = time.time() #Start timing the peak-finding algorithm

#Set everything below the threshold to zero:
Z_thresh = np.copy(Z)
Z_thresh[Z_thresh<threshold] = 0
print 'Time after thresholding: %.5f seconds'%(time.time()-t)

#now find the objects
labeled_image, number_of_objects = scipy.ndimage.label(Z_thresh)
print 'Time after labeling: %.5f seconds'%(time.time()-t)

peak_slices = scipy.ndimage.find_objects(labeled_image)
print 'Time after finding objects: %.5f seconds'%(time.time()-t)

def centroid(data):
    h,w = np.shape(data)   
    x = np.arange(0,w)
    y = np.arange(0,h)

    X,Y = np.meshgrid(x,y)

    cx = np.sum(X*data)/np.sum(data)
    cy = np.sum(Y*data)/np.sum(data)

    return cx,cy

centroids = []

for peak_slice in peak_slices:
    dy,dx  = peak_slice
    x,y = dx.start, dy.start
    cx,cy = centroid(Z_thresh[peak_slice])
    centroids.append((x+cx,y+cy))

print 'Total time: %.5f seconds\n'%(time.time()-t)

###########################################
#Now make the plots:
for ax in (ax1,ax2,ax3,ax4): ax.clear()
ax1.set_title('Original image')
ax1.imshow(Z,origin='lower')

ax2.set_title('Thresholded image')
ax2.imshow(Z_thresh,origin='lower')

ax3.set_title('Labeled image')
ax3.imshow(labeled_image,origin='lower') #display the color-coded regions

for peak_slice in peak_slices:  #Draw some rectangles around the objects
    dy,dx  = peak_slice
    xy     = (dx.start, dy.start)
    width  = (dx.stop - dx.start + 1)
    height = (dy.stop - dy.start + 1)
    rect = matplotlib.patches.Rectangle(xy,width,height,fc='none',ec='red')
    ax3.add_patch(rect,)

ax4.set_title('Centroids on original image')
ax4.imshow(Z,origin='lower')

for x,y in centroids:
    ax4.plot(x,y,'kx',ms=10)

ax4.set_xlim(0,size)
ax4.set_ylim(0,size)

plt.tight_layout
plt.show()

size=500 的结果：在此处输入图像描述

编辑：如果峰值的数量很大（~100）并且图像的大小很小，那么瓶颈实际上是质心部分。所以，或许这部分的速度也需要优化。

score 9 · Accepted Answer

您找到峰值的方法（简单阈值）当然对阈值的选择非常敏感：将其设置得太低，您将“检测”不是峰值的东西；设置得太高，你会错过有效的峰值。

还有更强大的替代方案，它们将检测图像强度中的所有局部最大值，而不管它们的强度值如何。我更喜欢使用一个小的（5x5 或 7x7）结构元素进行膨胀，然后找到原始图像及其膨胀版本具有相同值的像素。这是有效的，因为根据定义，膨胀（x，y，E，img）={以像素（x，y）为中心的E内的最大img}，因此膨胀（x，y，E，img）=img（x , y) 只要 (x,y) 是 E 尺度上的局部最大值的位置。

随着形态学算子的快速实现（例如 OpenCV 中的算子），该算法在空间和时间上的图像大小都是线性的（一个额外的图像大小的缓冲区用于扩张图像，并且两者都通过）。在紧要关头，它也可以在没有额外缓冲区和一点额外复杂性的情况下在线实现，而且它仍然是线性时间。

为了在存在椒盐噪声或类似噪声的情况下进一步加强它，这可能会引入许多错误的最大值，您可以应用该方法两次，使用不同大小的结构元素（例如，5x5 和 7x7），然后只保留稳定的最大值，其中稳定性可以通过最大值的位置不变或位置变化不超过一个像素等来定义。此外，当您有理由相信它们是由噪声引起的时，您可能希望抑制较低的附近最大值。一种有效的方法是首先检测上面的所有局部最大值，按高度降序排序，然后向下排序并保留它们，如果它们在图像中的值没有改变，如果它们被保留，则设置为将它们的 (2d+1) x (2d+1) 邻域中的所有像素归零，其中 d 是您愿意容忍的附近最大值之间的最小距离。

score 5 · Accepted Answer

如果你有很多峰，使用scipy.ndimage.center_of_mass. 您可以用以下两行替换以定义开头的代码peak_slices，直到打印总时间：

centroids = scipy.ndimage.center_of_mass(Z_thresh, labeled_image,
                                         np.arange(1, number_of_objects + 1))
centroids = [(j, i) for i, j in centroids]

因为num_peaks = 20这个运行速度比你的方法慢num_peaks = 100大约 3 倍，但它的运行速度大约快10 倍。因此，您的最佳选择将取决于您的实际数据。

score 2 · Accepted Answer

另一种方法是避免 all sum(), meshgrid()and stuff。用直线代数代替一切。

>>> def centroid2(data):
    h,w=data.shape
    x=np.arange(h)
    y=np.arange(w)
    x1=np.ones((1,h))
    y1=np.ones((w,1))
    return ((np.dot(np.dot(x1, data), y))/(np.dot(np.dot(x1, data), y1)),
            (np.dot(np.dot(x, data), y1))/(np.dot(np.dot(x1, data), y1)))
#be careful, it returns two arrays

这也可以扩展到更高的维度。相比于 60% 的加速centroid()

score 0 · Accepted Answer

以下质心计算比两者都快，尤其是对于大数据：

def centroidnp(data):
    h,w = data.shape
    x = np.arange(w)
    y = np.arange(h)
    vx = data.sum(axis=0)
    vx /= vx.sum()
    vy = data.sum(axis=1)
    vy /= vy.sum()    
    return np.dot(vx,x),np.dot(vy,y)

python - python中的快速寻峰和质心

4 回答 4

Related

Reference