这是一个简单的数据框:
Acid Balance_1 CustID Balance_2
0 1 0.082627 1 NaN
1 2 0.397579 1 0.459942
2 3 0.201596 2 0.596573
3 4 0.616448 3 0.705697
4 5 0.844865 3 0.483279
5 6 NaN 4 0.360260
在按 CustID 分组后,我一直在尝试使用聚合函数。
groupby_obj = time_series.groupby(["CustID"])
df = groupeby_obj.agg(set)
这返回
Acid \
CustID
1 set([Balance_1, Balance_2, Acid, CustID])
2 set([Balance_1, Balance_2, Acid, CustID])
3 set([Balance_1, Balance_2, Acid, CustID])
4 set([Balance_1, Balance_2, Acid, CustID])
Balance_1 \
CustID
1 set([Balance_1, Balance_2, Acid, CustID])
2 set([Balance_1, Balance_2, Acid, CustID])
3 set([Balance_1, Balance_2, Acid, CustID])
4 set([Balance_1, Balance_2, Acid, CustID])
Balance_2
CustID
1 set([Balance_1, Balance_2, Acid, CustID])
2 set([Balance_1, Balance_2, Acid, CustID])
3 set([Balance_1, Balance_2, Acid, CustID])
4 set([Balance_1, Balance_2, Acid, CustID])
而不是我认为它可能会做的事情:
Acid Balance_1 Balance_2
CustID
1 set([1,2]) set([0.082627, 0.397579]) set([NaN, 0.459942])
etc for the other CustIDs...
为什么聚合用所有列标题的集合填充数据框?
谢谢,安妮