python - 除了 np.where 之外，Scipy 标记数组的索引计算速度更快

Question

我正在研究一个大型阵列（3000 x 3000），我使用scipy.ndimage.label. 返回的是 3403 个标签和带标签的数组。我想知道这些标签的索引，例如标签 1 我应该知道标签数组中的行和列。所以基本上是这样的

a[0] = array([[1, 1, 0, 0],
              [1, 1, 0, 2],
              [0, 0, 0, 2],
              [3, 3, 0, 0]])


indices = [np.where(a[0]==t+1) for t in range(a[1])] #where a[1] = 3  is number of labels. 

print indices
[(array([0, 0, 1, 1]), array([0, 1, 0, 1])), (array([1, 2]), array([3, 3])), (array([3, 3]), array([0, 1]))]

我想为上面的所有 3403 标签创建一个索引列表。上面的方法似乎很慢。我尝试使用生成器，看起来没有改进。

有什么有效的方法吗？

score 0 · Accepted Answer

提高效率的想法是一旦进入循环就最小化工作。鉴于每个标签的元素数量可变，因此不可能使用矢量化方法。因此，考虑到这些因素，这里有一个解决方案 -

a_flattened = a[0].ravel()
sidx = np.argsort(a_flattened)
afs = a_flattened[sidx]
cut_idx = np.r_[0,np.flatnonzero(afs[1:] != afs[:-1])+1,a_flattened.size]
row, col = np.unravel_index(sidx, a[0].shape)
row_indices = [row[i:j] for i,j in zip(cut_idx[:-1],cut_idx[1:])]
col_indices = [col[i:j] for i,j in zip(cut_idx[:-1],cut_idx[1:])]

样本输入、输出 -

In [59]: a[0]
Out[59]: 
array([[1, 1, 0, 0],
       [1, 1, 0, 2],
       [0, 0, 0, 2],
       [3, 3, 0, 0]])

In [60]: a[1]
Out[60]: 3

In [62]: row_indices # row indices
Out[62]: 
[array([0, 0, 1, 2, 2, 2, 3, 3]), # for label-0
 array([0, 0, 1, 1]),             # for label-1
 array([1, 2]),                   # for label-2    
 array([3, 3])]                   # for label-3

In [63]: col_indices  # column indices
Out[63]: 
[array([2, 3, 2, 0, 1, 2, 2, 3]), # for label-0
 array([0, 1, 0, 1]),             # for label-1
 array([3, 3]),                   # for label-2
 array([0, 1])]                   # for label-3

第一个元素关闭row_indices并且col_indices是预期的输出。每个区域的第一组代表0-th区域，因此您可能希望跳过这些。

python - 除了 np.where 之外，Scipy 标记数组的索引计算速度更快

1 回答 1

Related

Reference