python - 如何根据列值对 Numpy 2D 矩阵中的行进行分组？

Question

2D NumPy按不同的列条件（例如按第 2 列值分组）对矩阵行进行分组f1()并f2()在每个组上运行和运行的有效（时间，简单）方法是什么？

谢谢

score 10 · Accepted Answer

arr如果您有一个shape数组(rows, cols)，则可以将第 2 列中所有值的向量作为

col = arr[:, 2]

然后，您可以使用分组条件构造一个布尔数组，例如第 1 组由第 2 列中值大于 5 的行组成：

idx = col > 5

您可以将此布尔数组直接应用于原始数组以选择行：

group_1 = arr[idx]
group_2 = arr[~idx]

例如：

>>> arr = np.random.randint(10, size=(6,4))
>>> arr
array([[0, 8, 7, 4],
       [5, 2, 6, 9],
       [9, 5, 7, 5],
       [6, 9, 1, 5],
       [8, 0, 5, 8],
       [8, 2, 0, 6]])
>>> idx = arr[:, 2] > 5
>>> arr[idx]
array([[0, 8, 7, 4],
       [5, 2, 6, 9],
       [9, 5, 7, 5]])
>>> arr[~idx]
array([[6, 9, 1, 5],
       [8, 0, 5, 8],
       [8, 2, 0, 6]])

score 6 · Accepted Answer

一个紧凑的解决方案是使用numpy_indexed（免责声明：我是它的作者），它为此类问题实现了一个完全矢量化的解决方案：

最简单的使用方法是：

import numpy_indexed as npi
npi.group_by(arr[:, col1]).mean(arr)

但这也有效：

# run function f1 on each group, formed by keys which are the rows of arr[:, [col1, col2]
npi.group_by(arr[:, [col1, col2]], arr, f1)

score 1 · Accepted Answer

from operator import itemgetter
sorted(my_numpy_array,key=itemgetter(1))

或者类似的东西

from itertools import groupby
from operator import itemgetter
print groupby(my_numpy_array,key = itemgetter(1))

python - 如何根据列值对 Numpy 2D 矩阵中的行进行分组？

3 回答 3

Related

Reference