python - 不同列中的熊猫 groupby 值

Question

我有这个数据框

frame =  pd.DataFrame({'player1' : ['Joe', 'Steve', 'Bill', 'Doug', 'Steve','Bill','Joe','Steve'],
                      'player2' : ['Bill', 'Doug', 'Steve', 'Joe', 'Bill', 'Steve', 'Doug', 'Bill'],
                      'winner' : ['Joe','Steve' , 'Steve','Doug', 'Bill', 'Steve', 'Doug', 'Steve'],
                      'loser' : ['Bill', 'Doug', 'Bill', 'Joe', 'Steve', 'Bill', 'Joe', 'Bill'],
                       'ones' : 1})

通过这样做，我可以保持获胜者获胜的总次数。

frame['winners_wins'] = frame.groupby('winner')['ones'].cumsum()

我想保持对 player1 的输赢和对 player2 的连续计数。我想我应该可以使用 groupby 函数来做到这一点，但我不知道如何编写它。

编辑：

我第一次说的不是很好。我想跟踪每个单独的玩家。所以所需的输出将是：

player1 player2 winner  loser   player1_wins  player2_wins
 Joe     Bill     Joe    Bill       1             0
 Steve   Doug     Steve  Doug       1             0
 Bill    Steve    Steve  Bill       0             2
 Doug    Joe      Doug    Joe       1             1
 Steve   Bill     Bill    Steve     2             1 
 Bill    Steve    Steve   Bill      1             3
 Joe     Doug     Doug    Joe       1             2   
 Steve   Bill     Steve   Bill      3             1

score 1 · Accepted Answer

看起来你想要一个连续的总数player1's并player2's获胜。这是一种非常普通的方法，它使用 Python 而不是 Pandas。

需要按顺序遍历行并使用先前结果来计算下一行的计算往往不利于 Pandas/Numpy 操作——cumsum这是一个例外。所以我不认为有一种巧妙的方法可以使用 Pandas 操作来做到这一点，但我可能是错的。

import pandas as pd
import collections

df = pd.DataFrame({'player1' : ['Joe', 'Steve', 'Bill', 'Doug',
                      'Steve','Bill','Joe','Steve'], 'player2' : ['Bill',
                      'Doug', 'Steve', 'Joe', 'Bill', 'Steve', 'Doug', 'Bill'],
                      'winner' : ['Joe','Steve' , 'Steve','Doug', 'Bill',
                      'Steve', 'Doug', 'Steve'], 'loser' : ['Bill', 'Doug',
                      'Bill', 'Joe', 'Steve', 'Bill', 'Joe', 'Bill'], },
                  columns = ['player1', 'player2', 'winner', 'loser'])

wins = collections.Counter()
def count_wins():
    for idx, row in df.iterrows():
        wins[row['winner']] += 1
        yield wins[row['player1']], wins[row['player2']]
df['player1_wins'], df['player2_wins'] = zip(*list(count_wins()))
print(df)

印刷

  player1 player2 winner  loser  player1_wins  player2_wins
0     Joe    Bill    Joe   Bill             1             0
1   Steve    Doug  Steve   Doug             1             0
2    Bill   Steve  Steve   Bill             0             2
3    Doug     Joe   Doug    Joe             1             1
4   Steve    Bill   Bill  Steve             2             1
5    Bill   Steve  Steve   Bill             1             3
6     Joe    Doug   Doug    Joe             1             2
7   Steve    Bill  Steve   Bill             4             1

score 1 · Accepted Answer

不需要那个“ones”列，或者，真的，不需要分组。

In [19]: del frame['ones']

In [20]: frame['player1_wins'] = (frame['winner'] == frame['player1']).astype('int').cumsum()

In [21]: frame['player2_wins'] = (frame['winner'] == frame['player2']).astype('int').cumsum()

In [22]: frame
Out[22]: 
   loser player1 player2 winner  player1_wins  player2_wins
0   Bill     Joe    Bill    Joe             1             0
1   Doug   Steve    Doug  Steve             2             0
2   Bill    Bill   Steve  Steve             2             1
3    Joe    Doug     Joe   Doug             3             1
4  Steve   Steve    Bill   Bill             3             2
5   Bill    Bill   Steve  Steve             3             3
6    Joe     Joe    Doug   Doug             3             4
7   Bill   Steve    Bill  Steve             4             4

一种winners_wins无需求助于“ones”列的方法是：

In [26]: frame['winners_wins'] = frame.groupby('winner').winner.transform(lambda x: np.arange(1, 1 + len(x))

python - 不同列中的熊猫 groupby 值

2 回答 2

Related

Reference