0

我有两个 python 数据框。下面是我的一个场景。虽然这只是一个例子,但我会有数百万条记录,超过 100 列。基本上,我需要比较 2 个数据帧并创建第三个数据帧,其输出将具有不同的列、列名和两列之间的值。

这是示例示例。

数据框1:

EmpId         EmpName   LastName       Sal      Dept     BusinessUnit
10020         Victor    Oliver         12000    AI       Amazon
23100         Jen       Len            21220    Oracle   Google
41667         Roby      Alfredo        15000    Java     LinkedIn
55124         Chen      Frido          15662    Java     Facebook

数据框2:

EmpId         EmpName   LastName       Sal      Dept     BusinessUnit
10020         Victor    Oliver         12000    AI       Amazon
23100         Jen       Len            31220    Oracle   AAA+
41667         Roby      Chan           15000    Java     LinkedIn
55124         Chen      Frido          15662    Java     Facebook

现在 DataFrame3 应该具有以下格式的结果。

(“索引/主键”)

EmpId        Column_name     dataFrame1_data     dataFrame2_data
23100         Salary         21220                31220 
23100         BusinessUnit   Google               AAA+ 
41667         LastName       Alfredo              Chan
4

1 回答 1

0

首先导入数据框

import pandas as pd

df=pd.DataFrame([[10020, 'Victor','Oliver',12000,'AI','Amazon'],
                 [23100 ,'Jen', 'Len' ,21220 ,'Oracle' ,'Google'],
                 [41667, 'Roby' ,'Alfredo' ,15000 ,'Java' ,'LinkedIn'] ,
                 [55124, 'Chen' ,'Frido' ,15662 ,'Java' ,'Facebook']])
df.columns=['EmpId','EmpName','LastName','Sal','Dept','BusinessUnit']

数据,

   EmpId EmpName LastName    Sal    Dept BusinessUnit
0  10020  Victor   Oliver  12000      AI       Amazon
1  23100     Jen      Len  21220  Oracle       Google
2  41667    Roby  Alfredo  15000    Java     LinkedIn
3  55124    Chen    Frido  15662    Java     Facebook

导入第二个数据框

df2=pd.DataFrame([[10020, 'Victor','Oliver',12000,'AI','Amazon'],
                 [23100 ,'Jen', 'Len' ,31220 ,'Oracle' ,'AAAA+'],
                 [41667, 'Roby' ,'Chan' ,15000 ,'Java' ,'LinkedIn'] ,
                 [55124, 'Chen' ,'Frido' ,15662 ,'Java' ,'Facebook']])
df2.columns=['EmpId','EmpName','LastName','Sal','Dept','BusinessUnit']

数据,

   EmpId EmpName LastName    Sal    Dept BusinessUnit
0  10020  Victor   Oliver  12000      AI       Amazon
1  23100     Jen      Len  31220  Oracle        AAAA+
2  41667    Roby     Chan  15000    Java     LinkedIn
3  55124    Chen    Frido  15662    Java     Facebook

融化 DataFrame

df1_melt=pd.melt(df,id_vars=['EmpId'])
df2_melt=pd.melt(df2,id_vars=['EmpId'])

数据,

    EmpId      variable     value
0   10020       EmpName    Victor
1   23100       EmpName       Jen
2   41667       EmpName      Roby
3   55124       EmpName      Chen
4   10020      LastName    Oliver
5   23100      LastName       Len
6   41667      LastName   Alfredo
7   55124      LastName     Frido
8   10020           Sal     12000
9   23100           Sal     21220
10  41667           Sal     15000
11  55124           Sal     15662
12  10020          Dept        AI
13  23100          Dept    Oracle
14  41667          Dept      Java
15  55124          Dept      Java
16  10020  BusinessUnit    Amazon
17  23100  BusinessUnit    Google
18  41667  BusinessUnit  LinkedIn
19  55124  BusinessUnit  Facebook

将 DataFrame 合并在一起,

pd.merge(df1_melt,df2_melt,how='inner',left_on=['EmpId','variable'],right_on=['EmpId','variable'])

数据,

    EmpId      variable   value_x   value_y
0   10020       EmpName    Victor    Victor
1   23100       EmpName       Jen       Jen
2   41667       EmpName      Roby      Roby
3   55124       EmpName      Chen      Chen
4   10020      LastName    Oliver    Oliver
5   23100      LastName       Len       Len
6   41667      LastName   Alfredo      Chan
7   55124      LastName     Frido     Frido
8   10020           Sal     12000     12000
9   23100           Sal     21220     31220
10  41667           Sal     15000     15000
11  55124           Sal     15662     15662
12  10020          Dept        AI        AI
13  23100          Dept    Oracle    Oracle
14  41667          Dept      Java      Java
15  55124          Dept      Java      Java
16  10020  BusinessUnit    Amazon    Amazon
17  23100  BusinessUnit    Google     AAAA+
18  41667  BusinessUnit  LinkedIn  LinkedIn
19  55124  BusinessUnit  Facebook  Facebook
于 2019-10-03T15:12:22.157 回答