numpy - 按 ndarray 的值对索引进行采样的更快解决方案

Question

我有一些相当大的数组要处理。通过大描述它们，我的意思是像 (514, 514, 374). 我想根据其像素值随机获取索引。例如，我需要一个像素值等于 1 的 3-d 索引。所以，我列出了所有的可能性

indices = np.asarray(np.where(img_arr == 1)).T

这很完美，除了它运行得非常慢，到了无法忍受的程度，因为数组太大了。所以我的问题是有没有更好的方法来做到这一点？如果我可以输入一个像素值列表，然后我会得到一个相应索引的列表，那就更好了。例如，我想对这些像素值的索引进行采样[0, 1, 2]，并返回索引列表[[1,2,3], [53, 215, 11], [223, 42, 113]]
由于我正在处理医学图像，SimpleITK因此也欢迎使用解决方案。所以请随时发表您的意见，谢谢。

score 1 · Accepted Answer

import numpy as np
value = 1
# value_list = [1, 3, 5] you can also use a list of values -> *
n_samples = 3
n_subset = 500

# Create a example array
img_arr = np.random.randint(low=0, high=5, size=(10, 30, 20))

# Choose randomly indices for the array
idx_subset = np.array([np.random.randint(high=s, size=n_subset) for s in x.shape]).T  
# Get the values at the sampled positions
values_subset = img_arr[[idx_subset[:, i] for i in range(img_arr.ndim)]]  
# Check which values match
idx_subset_matching_temp = np.where(values_subset == value)[0]
# idx_subset_matching_temp = np.argwhere(np.isin(values_subset, value_list)).ravel()  -> *
# Get all the indices of the subset with the correct value(s)
idx_subset_matching = idx_subset[idx_subset_matching_temp, :]  
# Shuffle the array of indices
np.random.shuffle(idx_subset_matching)  
# Only keep as much as you need
idx_subset_matching = idx_subset_matching[:n_samples, :]

这将为您提供所需的样本。这些样本的分布应该与您使用查看数组中所有匹配项的方法相同。在这两种情况下，您都会沿着具有匹配值的所有位置获得均匀分布。

在选择子集的大小和所需的样本数量时必须小心。子集必须足够大，以便有足够的值匹配，否则它将不起作用。如果您要采样的值非常稀疏，则会出现类似的问题，那么子集的大小需要非常大（在边缘情况下是整个数组）并且您一无所获。

如果您经常从同一个数组中采样，那么存储每个值的索引也是一个好主意

indices_i = np.asarray(np.where(img_arr == i)).T

并将它们用于您的进一步计算。

numpy - 按 ndarray 的值对索引进行采样的更快解决方案

1 回答 1

Related

Reference