python - 使用apply对pandas数据框行进行条件计数的pythonic方法是什么？

Question

我正在尝试对 pandas 数据框中的记录进行条件计数。我是 Python 的新手，并且有一个使用 for 循环的工作解决方案，但是在具有约 200k 行的大型数据帧上运行它需要很长时间，我相信通过定义一个函数并使用 apply 有更好的方法来做到这一点，但我很难弄清楚。

这是一个简单的例子。

创建一个包含两列的 pandas 数据框：

import pandas as pd
data = {'color': ['blue','green','yellow','blue','green','yellow','orange','purple','red','red'], 
        'weight': [4,5,6,4,1,3,9,8,4,1]
       }
df = pd.DataFrame(data)

# for each row, count the number of other rows with the same color and a lesser weight
counts = []
for i in df.index:
    c = df.loc[i, 'color']
    w = df.loc[i, 'weight']
    
    ct = len(df.loc[(df['color']==c) & (df['weight']<w)])
    counts.append(ct)

df['counts, same color & less weight'] = counts

对于每条记录，“计数，相同颜色和更轻的重量”列旨在获取 df 中具有相同颜色和更轻重量的其他记录的计数。例如，第 0 行 (blue, 4) 的结果为零，因为没有其他带有 color=='blue' 的记录具有较小的权重。第 1 行（绿色，5）的结果是 1，因为第 4 行也是颜色=='green'，但权重==1。

如何定义可应用于数据框以实现相同功能的函数？

我熟悉应用，例如将我使用的权重列平方：

df['weight squared'] = df['weight'].apply(lambda x: x**2)

...但我不清楚如何使用 apply 来进行引用整个 df 的条件计算。

提前感谢您的帮助。

score 1 · Accepted Answer

我们可以transform做min groupby

df.weight.gt(df.groupby('color').weight.transform('min')).astype(int)
0    0
1    1
2    1
3    0
4    0
5    0
6    0
7    0
8    1
9    0
Name: weight, dtype: int64
#df['c...]=df.weight.gt(df.groupby('color').weight.transform('min')).astype(int)

python - 使用apply对pandas数据框行进行条件计数的pythonic方法是什么？

1 回答 1

Related

Reference