考虑一个简单的记录数组结构:
import numpy as np
ijv_dtype = [
('I', 'i'),
('J', 'i'),
('v', 'd'),
]
ijv = np.array([
(0, 0, 3.3),
(0, 1, 1.1),
(0, 1, 4.4),
(1, 1, 2.2),
], ijv_dtype)
print(ijv) # [(0, 0, 3.3) (0, 1, 1.1) (0, 1, 4.4) (1, 1, 2.2)]
我想通过对 和 的唯一组合进行分组来汇总某些统计数据(总和、最小值、最大值等)。从 SQL 考虑,预期的结果是:v
I
J
select i, j, sum(v) as v from ijv group by i, j;
i | j | v
---+---+-----
0 | 0 | 3.3
0 | 1 | 5.5
1 | 1 | 2.2
(顺序不重要)
我能想到的最好的 NumPy 是丑陋的,我不确定我是否正确地订购了结果(尽管它似乎在这里工作):
# Get unique groups, index and inverse
u_ij, idx_ij, inv_ij = np.unique(ijv[['I', 'J']], return_index=True, return_inverse=True)
# Assemble aggregate
a_ijv = np.zeros(len(u_ij), ijv_dtype)
a_ijv['I'] = u_ij['I']
a_ijv['J'] = u_ij['J']
a_ijv['v'] = [ijv['v'][inv_ij == i].sum() for i in range(len(u_ij))]
print(a_ijv) # [(0, 0, 3.3) (0, 1, 5.5) (1, 1, 2.2)]
我想认为有更好的方法来做到这一点!我正在使用 NumPy 1.4.1。