我有一个类似于Merge two dataframes with multi-index的问题。</p>
在:
import pandas as pd
import numpy as np
row_x1 = ['a1','b1','c1']
row_x2 = ['a2','b2','c2']
row_x3 = ['a3','b3','c3']
row_x4 = ['a4','b4','c4']
index_arrays = [np.array(['first', 'first', 'second', 'second']), np.array(['one','two','one','two'])]
df1 = pd.DataFrame([row_x1,row_x2,row_x3,row_x4], columns=list('ABC'), index=index_arrays)
print(df1)
出去:
A B C
first one a1 b1 c1
two a2 b2 c2
second one a3 b3 c3
two a4 b4 c4
在:
row_y1 = ['d1','e1','f1']
row_y2 = ['d2','e2','f2']
row_y3 = ['d3','e3','f3']
index_arrays = [np.array(['first','first', 'second',]), np.array(['one','three','two'])]
df2 = pd.DataFrame([row_y1,row_y2,row_y3], columns=list('DEF'), index=index_arrays)
print(df2)
出去:
D E F
first one d1 e1 f1
three d2 e2 f2
second two d3 e3 f3
换句话说,我如何合并它们以实现 df3(如下)?
在:
row_x1 = ['a1','b1','c1']
row_x2 = ['a2','b2','c2']
row_x3 = ['a3','b3','c3']
row_x4 = ['a4','b4','c4']
row_y1 = ['d1','e1','f1']
row_y2 = ['d2','e2','f2']
row_y3 = ['d3','e3','f3']
row_z1 = row_x1 + row_y1
row_z2 = row_x2 + [np.nan, np.nan, np.nan]
row_z3 = [np.nan, np.nan, np.nan] + row_y2
row_z4 = row_x3 + [np.nan, np.nan, np.nan]
row_z5 = row_x4 + row_y3
index_arrays = [np.array(['first', 'first', 'first', 'second', 'second']), np.array(['one','two','three','one','two'])]
df3 = pd.DataFrame([row_z1,row_z2,row_z3,row_z4,row_z5], columns=list('ABCDEF'), index=index_arrays)
print(df3)
出去:
A B C D E F
first one a1 b1 c1 d1 e1 f1
two a2 b2 c2 NaN NaN NaN
three NaN NaN NaN d2 e2 f2
second one a3 b3 c3 NaN NaN NaN
two a4 b4 c4 d3 e3 f3
PS。感谢@Andreuccio 的问题!
感谢@Ajay Verma 和@EBDS。这确实是手动创建 df 数据的解决方案。但我对以下情况感到非常困惑:
我有两个来自统计数据的数据框。然后我为 pd.merge() 复制了相应的数据
在:
df1 = data1[data1.index.get_level_values(0) == 'BASIC_GZAG_TMB'].copy()
出去:
0 1 2 3
BASIC_GZAG_TMB 1 127.0 179.0 190.0 239.0
2 38.0 23.0 21.0 29.0
3 37.0 27.0 32.0 37.0
4 5.0 14.0 11.0 23.0
5 31.0 56.0 41.0 65.0
7 389.0 258.0 337.0 243.0
NaN 1323.0 1388.0 1307.0 1311.0
在:
df2 = data2[data2.index.get_level_values(0) == 'BASIC_GZAG_TMB'].copy()
出去:
0 1 2 3
BASIC_GZAG_TMB 1 207.0 232.0 252.0 223.0
2 26.0 18.0 19.0 20.0
3 43.0 41.0 50.0 42.0
4 35.0 27.0 37.0 15.0
5 54.0 52.0 78.0 64.0
6 1.0 1306.0 1.0 4.0
7 206.0 263.0 227.0 230.0
NaN 1374.0 1306.0 1282.0 1348.0
然后我通过以下方式合并 df1 和 df2:
df1.merge(df2, left_index=True, right_index=True, how='outer')
出去:
0_x 1_x 2_x 3_x 0_y 1_y 2_y \
BASIC_GZAG_TMB 1 127.0 179.0 190.0 239.0 207.0 232.0 252.0
2 38.0 23.0 21.0 29.0 26.0 18.0 19.0
3 37.0 27.0 32.0 37.0 43.0 41.0 50.0
4 5.0 14.0 11.0 23.0 35.0 27.0 37.0
5 31.0 56.0 41.0 65.0 54.0 52.0 78.0
7 389.0 258.0 337.0 243.0 206.0 263.0 227.0
NaN 1323.0 1388.0 1307.0 1311.0 1374.0 1306.0 1282.0
3_y
BASIC_GZAG_TMB 1 223.0
2 20.0
3 42.0
4 15.0
5 64.0
7 230.0
NaN 1348.0
我对 df2 中存在的 6 的索引在结果中消失了感到困惑。
我知道如果我使用 df2.merge(df1...) 可以成为一个解决方案。但其实data1和data2是动态生成的,不知道哪个有更多的索引。我只想得到 df1 和 df2 的联合。



