我已经通过时间最近的邻居成功地合并了两个 DataFrame。我当前的中间结果如下所示:
merge_key jd var2 index distance
2010-01-01 00:00:00 0 2455197.500000 0 2010-01-01 00:00:00 0
2010-01-01 00:06:00 0 2455197.500000 0 2010-01-01 00:00:00 -360
2010-01-01 00:12:00 0 2455197.500000 0 2010-01-01 00:00:00 -720
2010-01-01 00:18:00 1 2455197.517361 1 2010-01-01 00:25:00 420
2010-01-01 00:24:00 1 2455197.517361 1 2010-01-01 00:25:00 60
2010-01-01 00:30:00 1 2455197.517361 1 2010-01-01 00:25:00 -300
2010-01-01 00:36:00 1 2455197.517361 1 2010-01-01 00:25:00 -660
2010-01-01 00:42:00 2 2455197.534722 2 2010-01-01 00:50:00 480
2010-01-01 00:48:00 2 2455197.534722 2 2010-01-01 00:50:00 120
2010-01-01 00:54:00 2 2455197.534722 2 2010-01-01 00:50:00 -240
在下一步中,我想删除重复的条目并仅选择那些具有最小距离的条目。我想出了:
df.groupby("merge_key").apply(lambda x: x.ix[np.abs(x['distance']).idxmin()])
但是,这会导致:
merge_key jd var2 index distance
merge_key
0 0 2455198 0 2010-01-01 00:00:00 0
1 1 2455198 1 2010-01-01 00:25:00 60
2 2 2455198 2 2010-01-01 00:50:00 120
似乎“jd”中的数据类型已更改为整数?而且我也不希望将 merge_key 作为新索引。
我想要的输出实际上是:
merge_key jd var2 index distance
2010-01-01 00:00:00 0 2455197.500000 0 2010-01-01 00:00:00 0
2010-01-01 00:24:00 1 2455197.517361 1 2010-01-01 00:25:00 60
2010-01-01 00:48:00 2 2455197.534722 2 2010-01-01 00:50:00 120