我有一个像下面这样的数据框,它基本上是球员和他在局中得分的列表。我在这个数据框中有大约 50,000 行,数据框是根据日期排序的
PLAYER_CODE PLAYER_RUNS MATCH_ID
123 10 1
112 5 1
123 15 2
112 10 2
112 24 3
123 10 3
123 5 4
我需要添加新的列 PREV_TWO & PREV_THREE 这应该是他最后两局和三局的总和,从而让我跟随 DF
PLAYER_CODE PLAYER_RUNS PREV_TWO PREV_THREE
123 10 25 30
112 5 34 34
123 15 15 15
112 10 24 24
112 24 0 0
123 10 5 5
123 5 0 0
我想出了以下代码来做同样的事情:
playerList = dataFrame['PLAYER_CODE'].unique().tolist()
print(len(playerList) , " Players found in dataframe")
for playerCode in playerList:
#CREATE A PLAYER SPECIFIC DF TO LOOP AROUND THE ROWS
playerDF = dataFrame[dataFrame['PLAYER_CODE'] == playerCode]
playerRows = len(playerDF.index)
i = 0
for row in playerDF.itertuples(): #LOOP AROUND
j = i + 3 #TO GET THE 2-3 ROW
x = i + 4 #TO GET THE 2-4 ROW
#GET THE MATCH ID OF CURRENT ROW WILL BE USED TO IDENTIFY UNIQUE ROW TO UPDATE
playerMatchId = playerDF.iloc[i]['PLAYER_MATCH_ID']
#SUM THE WICKETS
sumoflasttwo = playerDF.iloc[i+1:j]['PLAYER_RUNS'].sum()
sumoflastthree = playerDF.iloc[i+1:x]['PLAYER_RUNS'].sum()
#UPDATE THE MAIN DATA FRAME
dataFrame.loc[(dataFrame['PLAYER_MATCH_ID'] == playerMatchId) &
(dataFrame['PLAYER_CODE'] == playerCode),'LAST_TWO_AVG'] = sumoflasttwo
dataFrame.loc[(dataFrame['PLAYER_MATCH_ID'] == playerMatchId) &
(dataFrame['PLAYER_CODE'] == playerCode),'LAST_THREE_AVG'] = sumoflastthree
i = i+1
这种方法有效,但确实很慢。我很确定有一种方法可以在没有循环的情况下做到这一点,但我不知道。有没有办法在不循环数据框的情况下做到这一点