我正在寻找一种对齐数据帧的好方法,每个数据帧都有一个“包含”秒的时间戳,而不会丢失数据。具体来说,我的问题如下:
这d1
是我的“主要”数据框。
ind1 = pd.date_range("20120101", "20120102",freq='S')[1:20]
data1 = np.random.randn(len(ind1))
df1 = pd.DataFrame(data1, index=ind1)
例如。df1 可能看起来像:
0
2012-01-01 00:00:01 2.738425
2012-01-01 00:00:02 -0.323905
2012-01-01 00:00:03 1.861855
2012-01-01 00:00:04 0.480284
2012-01-01 00:00:05 0.340270
2012-01-01 00:00:06 -1.139052
2012-01-01 00:00:07 -0.203018
2012-01-01 00:00:08 -0.398599
2012-01-01 00:00:09 -0.568802
2012-01-01 00:00:10 -1.539783
2012-01-01 00:00:11 -1.778668
2012-01-01 00:00:12 -1.488097
2012-01-01 00:00:13 0.889712
2012-01-01 00:00:14 -0.620267
2012-01-01 00:00:15 0.075169
2012-01-01 00:00:16 -0.091302
2012-01-01 00:00:17 -1.035364
2012-01-01 00:00:18 -0.459013
2012-01-01 00:00:19 -2.177190
此外,我还有另一个数据框,例如 df2:
ind21 = pd.date_range("20120101", "20120102",freq='S')[2:7]
ind22 = pd.date_range("20120101", "20120102",freq='S')[12:19]
data2 = np.random.randn(len(ind21+ind22))
df2 = pd.DataFrame(data2, index=ind21+ind22)
df2 看起来像(注意非周期性时间戳):
0
2012-01-01 00:00:02 -1.877779
2012-01-01 00:00:03 1.772659
2012-01-01 00:00:04 0.037251
2012-01-01 00:00:05 -1.195782
2012-01-01 00:00:06 -0.145339
2012-01-01 00:00:12 -0.220673
2012-01-01 00:00:13 -0.581469
2012-01-01 00:00:14 -0.520756
2012-01-01 00:00:15 -0.562677
2012-01-01 00:00:16 0.109325
2012-01-01 00:00:17 -0.195091
2012-01-01 00:00:18 0.838294
现在,我加入 df 并获得:
df = df1.join(df2, lsuffix='A')
0A 0
2012-01-01 00:00:01 2.738425 NaN
2012-01-01 00:00:02 -0.323905 -1.877779
2012-01-01 00:00:03 1.861855 1.772659
2012-01-01 00:00:04 0.480284 0.037251
2012-01-01 00:00:05 0.340270 -1.195782
2012-01-01 00:00:06 -1.139052 -0.145339
2012-01-01 00:00:07 -0.203018 NaN
2012-01-01 00:00:08 -0.398599 NaN
2012-01-01 00:00:09 -0.568802 NaN
2012-01-01 00:00:10 -1.539783 NaN
2012-01-01 00:00:11 -1.778668 NaN
2012-01-01 00:00:12 -1.488097 -0.220673
2012-01-01 00:00:13 0.889712 -0.581469
2012-01-01 00:00:14 -0.620267 -0.520756
2012-01-01 00:00:15 0.075169 -0.562677
2012-01-01 00:00:16 -0.091302 0.109325
2012-01-01 00:00:17 -1.035364 -0.195091
2012-01-01 00:00:18 -0.459013 0.838294
2012-01-01 00:00:19 -2.177190 NaN
这很好,但是,我想用 df2 的“分钟级别”值替换第 0 列中的 NaN 值。因此,只有在“秒级别”上没有完全匹配的情况下,我才想回到分钟级别。这可能是该特定分钟的所有值的简单平均值(此处:2012-01-01 00:00:00)。
感谢您的帮助!