python - apply() 方法：通过第二列的总和对第一列进行归一化

Question

我无法理解函数的工作原理：

""" the apply() method lets you apply an arbitrary function to the group 
result. The function take a DataFrame and returns a Pandas object (a df or 
series) or a scalar.
For example: normalize the first column by the sum of the second"""

def norm_by_data2(x):
    # x is a DataFrame of group values
    x['data1'] /= x['data2'].sum()
    return x
print (df); print (df.groupby('key').apply(norm_by_data2))

（摘自：“Python 数据科学手册”，Jake VanderPlas，第 167 页）

返回这个：

key  data1  data2
0   A      0      5
1   B      1      0
2   C      2      3
3   A      3      3
4   B      4      7
5   C      5      9
key     data1  data2
0   A  0.000000      5
1   B  0.142857      0
2   C  0.166667      3
3   A  0.375000      3
4   B  0.571429      7
5   C  0.416667      9

对我来说，了解其工作原理的最佳方法是手动计算值。

有人可以解释如何手动到达“data1”列的第二个值：0.142857

是1/7吗？但是这些值从何而来？

谢谢！

score 1 · Accepted Answer

我知道了！！

每组 B 列的总和为：

A: 5 + 3 =  8
B: 0 + 7 =  7
C: 3 + 9 = 12

例如，要达到 0.142857，将 1 除以 B 组的总和（即 7）：1/7 = 0.142857

python - apply() 方法：通过第二列的总和对第一列进行归一化

1 回答 1

Related

Reference