pandas - 重新采样系列/数据帧，频率锚定到特定时间

Question

我有不均匀的~secondly 数据，其时间序列索引如下所示：

import numpy as np
import pandas as pd

dates = [pd.datetime(2012, 2, 5, 17,00,35,327000), pd.datetime(2012, 2, 5, 17,00,37,325000),pd.datetime(2012, 2, 5, 17,00,37,776000),pd.datetime(2012, 2, 5, 17,00,38,233000),pd.datetime(2012, 2, 5, 17,00,40,946000),pd.datetime(2012, 2, 5, 17,00,41,327000),pd.datetime(2012, 2, 5, 17,00,42,06000),pd.datetime(2012, 2, 5, 17,00,44,99000),pd.datetime(2012, 2, 5, 17,00,44,99000),pd.datetime(2012, 2, 5, 17,00,46,289000),pd.datetime(2012, 2, 5, 17,00,49,96000),pd.datetime(2012, 2, 5, 17,00,53,240000)]

inhomogeneous_secondish_series = pd.Series(np.random.randn(len(dates)), name='some_col', index=pd.DatetimeIndex(dates))

In [26]: inhomogeneous_secondish_series
Out[26]: 
2012-02-05 17:00:35.327000   -0.903398
2012-02-05 17:00:37.325000    0.535798
2012-02-05 17:00:37.776000    0.847231
2012-02-05 17:00:38.233000   -1.280244
2012-02-05 17:00:40.946000    1.330232
2012-02-05 17:00:41.327000    2.287555
2012-02-05 17:00:42.003072   -1.469432
2012-02-05 17:00:44.099000   -1.174953
2012-02-05 17:00:44.099000   -1.020135
2012-02-05 17:00:46.289000   -0.200043
2012-02-05 17:00:49.096000   -0.665699
2012-02-05 17:00:53.240000    0.748638
Name: some_col

我想重新采样说“5s”。通常我会这样做：

In [28]: inhomogeneous_secondish_series.resample('5s')

这会产生很好的重新采样的 5 秒数据，锚定到第 0 秒；在结果中，索引中的每个项目将从给定分钟的第 0 秒开始为 5 秒的倍数：

2012-02-05 17:00:40   -0.200153
2012-02-05 17:00:45   -0.009347
2012-02-05 17:00:50   -0.432871
2012-02-05 17:00:55    0.748638
Freq: 5S

我将如何将重新采样的数据锚定在最近的样本时间附近，所以索引看起来像：

...
2012-02-05 17:00:38.240000  (some correct resample value)
2012-02-05 17:00:43.240000  (some correct resample value)
2012-02-05 17:00:48.240000  (some correct resample value)
2012-02-05 17:00:53.240000  (some correct resample value)
Freq: 5S

我希望答案可能在于 resample() 的 loffset 参数，但想知道是否有比在重新采样之前计算 loffset 更简单的方法。我是否必须查看最新的样本，找出它与最近的正常 5s 频率的偏移并将其输入到 loffset 中？

score 1 · Accepted Answer

loffset只需更改标签，而不更改您的数据分组到新频率的方式。所以使用你的例子：

max_date = max(dates)
offset = timedelta(seconds=(max_date.second % 5)-5
                , microseconds=max_date.microsecond-1)
inhomogeneous_secondish_series.resample('5s', loffset=offset)

会给你：

2012-02-05 17:00:38.239999   -0.200153
2012-02-05 17:00:43.239999   -0.009347
2012-02-05 17:00:48.239999   -0.432871
2012-02-05 17:00:53.239999    0.748638
Freq: 5S

据我了解，这不是您想要的 - 最后一个值应该是数据集中最后两个值的平均值，而不仅仅是最后一个值。

要更改频率的锚定方式，您可以使用base. 但是，因为这需要是一个整数，所以您应该使用适当的微秒频率，例如：

freq_base = (max_date.second % 5)*1000000 + max_date.microsecond
inhomogeneous_secondish_series.resample('5000000U', base=freq_base)

pandas - 重新采样系列/数据帧，频率锚定到特定时间

1 回答 1

Related

Reference