0

假设这是我的数据框:

country     Edition   sports       Athletes        Medal      Gender   Score
Germany     1990    Aquatics  HAJOS, Alfred          gold       M        3
Germany     1990    Aquatics  HIRSCHMANN, Otto       silver     M        2
Germany     1990    Aquatics  DRIVAS, Dimitrios      gold       W        3
Germany     1990    Aquatics  DRIVAS, Dimitrios      silver     W        2
US          2008    Athletics MALOKINIS, Ioannis     gold       M        1
US          2008    Athletics HAJOS, Alfred          silver     M        2
US          2009    Athletics CHASAPIS, Spiridon     gold       W        3
France      2010    Athletics CHOROPHAS, Efstathios  gold       W        3
France      2010    Athletics CHOROPHAS, Efstathios  gold       M        3
France      2010    golf      HAJOS, Alfred          Bronze     M        1
France      2011    golf      ANDREOU, Joannis       silver     W        2
Spain       2011    golf      BURKE, Thomas          gold       M        3

我试图找出多少个国家的男性得分之和等于女性得分之和?我尝试了以下方法:

sum_men = df[ df ['Gender']=='M'].groupby ( 'country' )[Score ].sum()
sum_women = df[ df ['Gender']=='W'].groupby ( 'country' )[Score ].sum()

现在我不知道如何比较这两者并过滤掉男性得分总和等于女性得分总和的国家数量。

有人可以帮我吗?

4

2 回答 2

1

不确定您是否要留下平等或其他的人,但同样的逻辑适用:

group = df.groupby(['country', 'Gender'])['Score'].sum().unstack()
not_equal = group[group.M != group.W]
filtered_df = df[df.country.isin(not_equal.index)]

输出:

   country  Edition     sports               Athletes   Medal Gender  Score  score_sum
7   France     2010  Athletics  CHOROPHAS, Efstathios    gold      W      3          5
8   France     2010  Athletics  CHOROPHAS, Efstathios    gold      M      3          4
9   France     2010       golf          HAJOS, Alfred  Bronze      M      1          4
10  France     2011       golf       ANDREOU, Joannis  silver      W      2          5
11   Spain     2011       golf          BURKE, Thomas    gold      M      3          3
于 2019-12-18T21:59:14.017 回答
1

你可以这样做:

sum_men = df[df['Gender']=='M'].groupby ('Country' )['Score'].sum().reset_index() #watch the reset_index()
sum_women = df[df['Gender']=='W'].groupby ('Country' )['Score'].sum().reset_index()
new_df = sum_men.merge(sum_women, on="Country")
new_df['diff'] = new_df['Score_x'] - new_df['Score_y']
new_df

   Country  Score_x  Score_y  diff
0   France        4        5    -1
1  Germany        5        5     0
2       US        3        3     0

print(new_df[new_df['diff']==0])

Country  Score_x  Score_y  diff
1  Germany        5        5     0
2       US        3        3     0
于 2019-12-18T22:00:40.373 回答