python - Pandas：为每个组重新采样每小时数据

Question

我有一个数据框，其中包含一天中不同时间收到的车辆的 gps 位置。对于每辆车，我想重新采样每小时数据，以便我获得一天中每个小时的中值报告（根据时间戳）。对于没有相应行的小时，我想要一个空白行。我正在使用以下代码：

for i,j in enumerate(list(df.id.unique())):
        data=df.loc[df.id==j]        
        data['hour']=data['timestamp'].hour
        data_grouped=data.groupby(['imo','hour']).median().reset_index()
        data = data_grouped.set_index('hour').reindex(idx).reset_index() #idx is a list of integers from 0 to 23.

由于我的数据框有数百万个 id，因此我需要花费大量时间来迭代所有这些。有没有一种有效的方法来做到这一点？

与Pandas 在 Groupby 中重新索引日期不同，我每小时有多行，除了一些小时根本没有行。

score 1 · Accepted Answer

在最新版本的 pandas 中测试，将hour列转换为具有所有可能类别的分类，然后在没有的情况下进行聚合loop：

df['hour'] = pd.Categorical(df['timestamp'].dt.hour, categories=range(24))
df1 = df.groupby(['id','imo','hour']).median().reset_index()

python - Pandas：为每个组重新采样每小时数据

1 回答 1

Related

Reference