python - 如何在 numpy ndarray 中找到最常见的值？

Question

我有一个形状为 (30,480,640) 的 numpy ndarray，第 1 和第 2 轴代表位置（纬度和经度），第 0 轴包含实际数据点。我想在每个位置沿第 0 轴使用最频繁的值，其中是构造一个形状为（1,480,640）的新数组。即：

>>> data
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[40, 40, 42, 43, 44],
        [45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59]]])

(perform calculation)

>>> new_data 
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]]])

数据点将包含负数和正数浮点数。我怎样才能进行这样的计算？非常感谢！

我尝试使用 numpy.unique，但我得到“TypeError: unique() got an unexpected keyword argument 'return_inverse'”。我正在使用 Unix 上安装的 numpy 版本 1.2.1，它不支持 return_inverse ..我也试过模式，但是处理如此大量的数据需要永远......那么有没有另一种方法来获得最频繁的值？再次感谢。

score 24 · Accepted Answer

要查找平面数组的最常见值，请使用unique和：bincountargmax

arr = np.array([5, 4, -2, 1, -2, 0, 4, 4, -6, -1])
u, indices = np.unique(arr, return_inverse=True)
u[np.argmax(np.bincount(indices))]

要使用多维数组，我们不需要担心unique，但我们确实需要使用apply_along_axison bincount：

arr = np.array([[5, 4, -2, 1, -2, 0, 4, 4, -6, -1],
                [0, 1,  2, 2,  3, 4, 5, 6,  7,  8]])
axis = 1
u, indices = np.unique(arr, return_inverse=True)
u[np.argmax(np.apply_along_axis(np.bincount, axis, indices.reshape(arr.shape),
                                None, np.max(indices) + 1), axis=axis)]

使用您的数据：

data = np.array([
   [[ 0,  1,  2,  3,  4],
    [ 5,  6,  7,  8,  9],
    [10, 11, 12, 13, 14],
    [15, 16, 17, 18, 19]],

   [[ 0,  1,  2,  3,  4],
    [ 5,  6,  7,  8,  9],
    [10, 11, 12, 13, 14],
    [15, 16, 17, 18, 19]],

   [[40, 40, 42, 43, 44],
    [45, 46, 47, 48, 49],
    [50, 51, 52, 53, 54],
    [55, 56, 57, 58, 59]]])
axis = 0
u, indices = np.unique(arr, return_inverse=True)
u[np.argmax(np.apply_along_axis(np.bincount, axis, indices.reshape(arr.shape),
                                None, np.max(indices) + 1), axis=axis)]
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

NumPy 1.2，真的吗？您可以np.unique(return_inverse=True)合理有效地使用np.searchsorted（这是一个额外的 O( n log n )，因此不应显着改变性能）：

u = np.unique(arr)
indices = np.searchsorted(u, arr.flat)

score 8 · Accepted Answer

使用 SciPy 的 mode 函数：

import numpy as np
from scipy.stats import mode

data = np.array([[[ 0,  1,  2,  3,  4],
                  [ 5,  6,  7,  8,  9],
                  [10, 11, 12, 13, 14],
                  [15, 16, 17, 18, 19]],

                 [[ 0,  1,  2,  3,  4],
                  [ 5,  6,  7,  8,  9],
                  [10, 11, 12, 13, 14],
                  [15, 16, 17, 18, 19]],

                 [[40, 40, 42, 43, 44],
                  [45, 46, 47, 48, 49],
                  [50, 51, 52, 53, 54],
                  [55, 56, 57, 58, 59]]])

print data

# find mode along the zero-th axis; the return value is a tuple of the
# modes and their counts.
print mode(data, axis=0)

score 1 · Accepted Answer

在我看来，一个稍微好一点的解决方案如下

tmpL = np.array([3, 2, 3, 2, 5, 2, 2, 3, 3, 2, 2, 2, 3, 3, 2, 2, 3, 2, 3, 2])
unique, counts = np.unique(tmpL, return_counts=True)
return unique[np.argmax(counts)]

使用np.unique我们可以获得每个唯一元素的计数。in 中最大元素的索引counts将是中的对应元素unique。

score 0 · Accepted Answer

flattencollections.Counter你的数组，然后从中构建一个。像往常一样，在比较浮点数时要特别小心。

score 0 · Accepted Answer

解释@ecatmurs 部分

u[np.argmax(np.apply_along_axis(np.bincount, axis, indices.reshape(arr.shape),
                                None, np.max(indices) + 1), axis=axis)]

更多一点，并在重新阅读时将其重组为更简洁（因为我使用了这个解决方案，几周后我想知道这个函数发生了什么）：

axis = 0
uniques, indices = np.unique(arr, return_inverse=True)

args_for_bincount_fn = None, np.max(indices) + 1
binned_indices = np.apply_along_axis(np.bincount,
                            last_axis, 
                            indices.reshape(arr.shape),
                            *args_for_bincount_fn)

most_common = uniques[np.argmax(binned_indices,axis=axis)]

python - 如何在 numpy ndarray 中找到最常见的值？

5 回答 5

Related

Reference