python - python pandas groupby() 结果

Question

我有以下 python pandas 数据框：

df = pd.DataFrame( {
   'A': [1,1,1,1,2,2,2,3,3,4,4,4],
   'B': [5,5,6,7,5,6,6,7,7,6,7,7],
   'C': [1,1,1,1,1,1,1,1,1,1,1,1]
    } );

df
    A  B  C
0   1  5  1
1   1  5  1
2   1  6  1
3   1  7  1
4   2  5  1
5   2  6  1
6   2  6  1
7   3  7  1
8   3  7  1
9   4  6  1
10  4  7  1
11  4  7  1

我想要另一列存储固定（两者）A和B的C值的总和值。也就是说，类似于：

    A  B  C  D
0   1  5  1  2
1   1  5  1  2
2   1  6  1  1
3   1  7  1  1
4   2  5  1  1
5   2  6  1  2
6   2  6  1  2
7   3  7  1  2
8   3  7  1  2
9   4  6  1  1
10  4  7  1  2
11  4  7  1  2

我已经尝试过使用 pandasgroupby并且它很有效：

res = {}
for a, group_by_A in df.groupby('A'):
    group_by_B = group_by_A.groupby('B', as_index = False)
    res[a] = group_by_B['C'].sum()

但我不知道如何以有序的方式“获取”res结果df。对此有任何建议都会非常高兴。谢谢你。

score 23 · Accepted Answer

这是一种方法（尽管感觉这应该与应用一起使用，但我无法理解）。

In [11]: g = df.groupby(['A', 'B'])

In [12]: df1 = df.set_index(['A', 'B'])

sizegroupby 函数是你想要的，我们必须将它匹配到'A'和'B'作为索引：

In [13]: df1['D'] = g.size()  # unfortunately this doesn't play nice with as_index=False
# Same would work with g['C'].sum()

In [14]: df1.reset_index()
Out[14]:
    A  B  C  D
0   1  5  1  2
1   1  5  1  2
2   1  6  1  1
3   1  7  1  1
4   2  5  1  1
5   2  6  1  2
6   2  6  1  2
7   3  7  1  2
8   3  7  1  2
9   4  6  1  1
10  4  7  1  2
11  4  7  1  2

score 13 · Accepted Answer

您还可以使用应用于 groupby 的变换来做一个衬里：

df['D'] = df.groupby(['A','B'])['C'].transform('sum')

score 8 · Accepted Answer

您也可以使用合并来做一个单行，如下所示：

df = df.merge(pd.DataFrame({'D':df.groupby(['A', 'B'])['C'].size()}), left_on=['A', 'B'], right_index=True)

score 0 · Accepted Answer

你可以使用这个方法：

columns = ['col1','col2',...]
df.groupby('col')[columns].sum()

如果您愿意，您还可以使用.sort_values(by = 'colx', ascending = True/False)after.sum()按特定列 (colx) 并按升序或降序对最终输出进行排序。

python - python pandas groupby() 结果

4 回答 4

Related

Reference