我将天气数据存储在许多单独的文件中,其中列用于特定的测量仪器,每一行对应于特定日期的平均读数。假设一个文件如下所示:
first = pd.DataFrame(np.random.random((10,3)),
pd.date_range('1950-01-01', periods=10),
columns=['A','B','C'])
first
Out[21]:
A B C
1950-01-01 0.939932 0.504543 0.091025
1950-01-02 0.121418 0.725333 0.444813
1950-01-03 0.338385 0.783398 0.116468
1950-01-04 0.847905 0.846147 0.226074
1950-01-05 0.156315 0.704804 0.524886
1950-01-06 0.412284 0.425379 0.427246
1950-01-07 0.165859 0.406347 0.114586
1950-01-08 0.392670 0.789526 0.174001
1950-01-09 0.246180 0.776304 0.019368
1950-01-10 0.142213 0.731748 0.954076
看起来像这样的一秒钟,
second = pd.DataFrame(np.random.random((10,3)),
pd.date_range('1950-01-11', periods=10),
columns=['A','B','D'])
second
Out[30]:
A B D
1950-01-11 0.190767 0.905640 0.325411
1950-01-12 0.109964 0.754694 0.414402
1950-01-13 0.058164 0.305405 0.768333
1950-01-14 0.267644 0.919876 0.631083
1950-01-15 0.981333 0.454678 0.533075
1950-01-16 0.831600 0.823845 0.980366
1950-01-17 0.303585 0.091634 0.338517
1950-01-18 0.723445 0.088020 0.570779
1950-01-19 0.639665 0.954577 0.763810
1950-01-20 0.370629 0.716066 0.628383
我想将这两者合并在一起,以便所有仪器(即 A、B、C、D...)可以显示在具有所有测量时间段的同一文件中。预期结果如下所示:
A B C D
1950-01-01 0.939932 0.504543 0.091025
1950-01-02 0.121418 0.725333 0.444813
1950-01-03 0.338385 0.783398 0.116468
1950-01-04 0.847905 0.846147 0.226074
1950-01-05 0.156315 0.704804 0.524886
1950-01-06 0.412284 0.425379 0.427246
1950-01-07 0.165859 0.406347 0.114586
1950-01-08 0.392670 0.789526 0.174001
1950-01-09 0.246180 0.776304 0.019368
1950-01-10 0.142213 0.731748 0.954076
1950-01-11 0.190767 0.905640 0.325411
1950-01-12 0.109964 0.754694 0.414402
1950-01-13 0.058164 0.305405 0.768333
1950-01-14 0.267644 0.919876 0.631083
1950-01-15 0.981333 0.454678 0.533075
1950-01-16 0.831600 0.823845 0.980366
1950-01-17 0.303585 0.091634 0.338517
1950-01-18 0.723445 0.088020 0.570779
1950-01-19 0.639665 0.954577 0.763810
1950-01-20 0.370629 0.716066 0.628383
为了得到这个,我试过:
first.merge(second, how='outer', left_index=True, right_index=True)
Out[34]:
A_x B_x C A_y B_y D
1950-01-01 0.939932 0.504543 0.091025 NaN NaN NaN
1950-01-02 0.121418 0.725333 0.444813 NaN NaN NaN
1950-01-03 0.338385 0.783398 0.116468 NaN NaN NaN
1950-01-04 0.847905 0.846147 0.226074 NaN NaN NaN
1950-01-05 0.156315 0.704804 0.524886 NaN NaN NaN
1950-01-06 0.412284 0.425379 0.427246 NaN NaN NaN
1950-01-07 0.165859 0.406347 0.114586 NaN NaN NaN
1950-01-08 0.392670 0.789526 0.174001 NaN NaN NaN
1950-01-09 0.246180 0.776304 0.019368 NaN NaN NaN
1950-01-10 0.142213 0.731748 0.954076 NaN NaN NaN
1950-01-11 NaN NaN NaN 0.190767 0.905640 0.325411
1950-01-12 NaN NaN NaN 0.109964 0.754694 0.414402
1950-01-13 NaN NaN NaN 0.058164 0.305405 0.768333
1950-01-14 NaN NaN NaN 0.267644 0.919876 0.631083
1950-01-15 NaN NaN NaN 0.981333 0.454678 0.533075
1950-01-16 NaN NaN NaN 0.831600 0.823845 0.980366
1950-01-17 NaN NaN NaN 0.303585 0.091634 0.338517
1950-01-18 NaN NaN NaN 0.723445 0.088020 0.570779
1950-01-19 NaN NaN NaN 0.639665 0.954577 0.763810
1950-01-20 NaN NaN NaN 0.370629 0.716066 0.628383
但正如您所见,需要合并的列已被拆分,因为没有公共行索引。我觉得这个功能对 pandas 来说是一个非常有用的补充。这可以做到吗?