1

我最初data.head()的结果是:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 45993 entries, 2009-11-17 14:14:00 to 2012-12-16 14:26:00
Data columns (total 4 columns):
rain         45993  non-null values
temp         45993  non-null values
windspeed    45993  non-null values
dew_point    45993  non-null values
dtypes: float64(4)

2009-11-17 14:14:00  0   22.5    4.9     12.3
2009-11-17 14:44:00  0   22.3    6.1     12.1
2009-11-17 15:14:00  0   22.1    5.3     12.5
2009-11-17 15:44:00  0   22.2    3.3     12.0
2009-11-17 16:14:00  0   20.4    4.9     11.7

当我重新采样时:

data = data.resample('30min', how ='sum')
data.head()

我得到:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 68861 entries, 2009-01-12 00:00:00 to 2012-12-16 14:00:00
Freq: 30T
Data columns (total 4 columns):
rain         45987  non-null values
temp         45987  non-null values
windspeed    45987  non-null values
dew_point    45987  non-null values
dtypes: float64(4)

2009-01-12 00:00:002     0   17.4    7.1     14.6
2009-01-12 00:30:00  0   17.4    7.2     14.7
2009-01-12 01:00:00  0   18.0    10.5    14.3
2009-01-12 01:30:00  0   18.3    9.6     14.2
2009-01-12 02:00:00  0   18.4    10.8    14.8

如您所见,我的初始日期是 2009-11-17 14:14:00,但重新采样日从 2009-01-12 开始。谁能解释发生这种情况?

编辑,我确实找到了问题,所以对于其他人来说,提供的数据集有:

2009-01-12 00:00:00 value
2009-01-12 00:30:00 value ... but the next line was!!!!!
2009-01-12 01:00    value

所以缺少 :00 秒让所有的混乱

4

0 回答 0