听起来好像您有“左”和“右”数据框,并且您正在寻找那些仅在其中一个或另一个中的记录。下面返回仅在右侧或左侧 DataFrame 中的行。
import pandas as pd
import numpy as np
from pandas import DataFrame, Series
dataframe_left = DataFrame(np.random.randn(25).reshape(5,5),columns=['A','B','C','D','E'],index=np.arange(5))
dataframe_right = DataFrame(np.random.randn(25).reshape(5,5),columns=['A','B','C','D','E'],index=np.arange(5))
insert_left = DataFrame(np.arange(5).reshape(1,5),columns=['A','B','C','D','E'],index=[7])
insert_right = DataFrame(np.arange(5).reshape(1,5),columns=['A','B','C','D','E'], index=[6])
dataframe_right = dataframe_right.append(insert_right)
dataframe_left = dataframe_left.append(insert_left)
上面的代码产生这个输出
左表
|
一个 |
乙 |
C |
D |
乙 |
0 |
-0.3240086903973736 |
1.0441549453943946 |
-0.23640436950107843 |
0.5466767470739027 |
-0.2123693649877372 |
1 |
-0.04263388410830733 |
-0.4855492977594353 |
-1.5584284407735072 |
1.2438524586306603 |
-0.31087239909921277 |
2 |
0.6982581750529829 |
-0.42379154444215905 |
1.1625089013522614 |
-3.378898146269229 |
1.0550121763954057 |
3 |
0.3774337535208665 |
0.6402576096348337 |
-0.2787520258645991 |
0.31071767629270125 |
0.34499495360962007 |
4 |
-0.133649590435452 |
0.3679768579635411 |
-2.0196709364730014 |
1.2860033685128436 |
-0.49674737879741193 |
7 |
0.0 |
1.0 |
2.0 |
3.0 |
4.0 |
右表
|
一个 |
乙 |
C |
D |
乙 |
0 |
-0.09946693056759418 |
-0.03378933704588447 |
-0.4117873368048701 |
0.21976489856531914 |
-0.7020527418892488 |
1 |
-2.9936183481793233 |
0.42443360961021837 |
-0.1681576564885903 |
-0.5080538565354785 |
-0.29483296271514153 |
2 |
-0.6567306172004121 |
-1.221239625798079 |
-1.2604670988941196 |
0.44472543746187265 |
-0.4562966381137614 |
3 |
-0.0027697712245823482 |
0.1323767897141191 |
-0.11073953230359104 |
-0.3596157927825233 |
1.9894525572891626 |
4 |
0.5170901011452596 |
-1.1694605240821456 |
0.29238712582282705 |
-0.38912521589557797 |
-0.8793074660039492 |
6 |
0.0 |
1.0 |
2.0 |
3.0 |
4.0 |
设置好测试数据框后,我们可以加入两者并过滤我们感兴趣的行:
tmp = pd.merge(
left=dataframe_left,
right=dataframe_right,
right_index=True,
left_index=True,
how='outer',
suffixes=['_left','_right'],
indicator=True
)
tmp[tmp._merge.isin(['right_only','left_only'])]
这会产生以下结果
|
A_left |
B_left |
C_left |
D_left |
E_left |
A_right |
明亮的 |
C_right |
D_right |
E_right |
_合并 |
6 |
|
|
|
|
|
0.0 |
1.0 |
2.0 |
3.0 |
4.0 |
right_only |
7 |
0.0 |
1.0 |
2.0 |
3.0 |
4.0 |
|
|
|
|
|
left_only |