2

我正在尝试外部加入(在 df1 上)两个熊猫数据框。以下是示例数据框:

df1:
Index   Team 1   Team 2   Team1_Score    Team2_Score
 0       A        B        25              56
 1       B        C        30              55
 2       D        E        35              75

df2:
Index   Team 1   Team 2   Team1_Avg     Team2_Avg
 0       A        B        5              15
 1       G        F        10             25
 2       C        B        15             35

dfcombined
Index   Team 1   Team 2   Team1_Score    Team2_Score    Team2_Avg     Team1_Avg
 0       A        B        25              56           5             15
 1       B        C        30              55           35            15
 2       D        E        35              75        

我试图使用 pandasql 模块,但是我不确定如何处理在 df1 中加入索引 1 和在 df2 中加入索引 2 的情况,因为团队的顺序颠倒了。通过pandasql模块,如果团队顺序相反,我不确定如何在组合数据框中切换团队平均值。

我将不胜感激这方面的任何帮助。

4

2 回答 2

2

设置 -

df1

      Team 1 Team 2  Team1_Score  Team2_Score
Index                                        
0          A      B           25           56
1          B      C           30           55
2          D      E           35           75

df2

      Team 1 Team 2  Team1_Avg  Team2_Avg
Index                                    
0          A      B          5         15
1          F      G         25         10
2          B      C         35         15

首先,我们需要对Team *列进行排序,并相应Team*_Score地以相同的方式对列进行排序。我们将习惯argsort这样做。

i = np.arange(len(df1))[:, None]
j = np.argsort(df1[['Team 1', 'Team 2']], axis=1).values

df1[['Team 1', 'Team 2']] = df1[['Team 1', 'Team 2']].values[i, j]
df1[['Team1_Score', 'Team2_Score']] = df1[['Team1_Score', 'Team2_Score']].values[i, j]

df2现在,对、Team *和重复相同的过程Team*_Avg

j = np.argsort(df2[['Team 1', 'Team 2']], axis=1).values

df2[['Team 1', 'Team 2']] = df2[['Team 1', 'Team 2']].values[i, j]
df2[['Team1_Avg', 'Team2_Avg']] = df2[['Team1_Avg', 'Team2_Avg']].values[i, j]

现在,执行左外merge-

df1.merge(df2, on=['Team 1', 'Team 2'], how='left')

  Team 1 Team 2  Team1_Score  Team2_Score Team1_Avg Team2_Avg
0      A      B           25           56         5        15
1      B      C           30           55        35        15
2      D      E           35           75                 
于 2017-12-26T16:44:32.947 回答
0

您可以做的是pd.concat()通过翻转列名来复制 df2 。您可以通过将它们设置为rename

df3 = df2.rename(columns={'Team 1':'Team 2','Team 2':'Team 1', 
        'Team1_Avg':'Team2_Avg','Team2_Avg':'Team1_Avg'})

现在我们可以在 df2mergeconcat新创建的df3

df1.merge(pd.concat([df2,df3]),how='left',on=['Team 1','Team 2'])

这为您提供了所需的 DataFrame

  Team 1 Team 2  Team1_Score  Team2_score  Team1_Avg  Team2_Avg
0      A      B           25           56        5.0       15.0
1      B      C           30           55       35.0       15.0
2      D      E           25           75        NaN        NaN
于 2017-12-26T17:04:40.407 回答