我最初data.head()
的结果是:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 45993 entries, 2009-11-17 14:14:00 to 2012-12-16 14:26:00
Data columns (total 4 columns):
rain 45993 non-null values
temp 45993 non-null values
windspeed 45993 non-null values
dew_point 45993 non-null values
dtypes: float64(4)
2009-11-17 14:14:00 0 22.5 4.9 12.3
2009-11-17 14:44:00 0 22.3 6.1 12.1
2009-11-17 15:14:00 0 22.1 5.3 12.5
2009-11-17 15:44:00 0 22.2 3.3 12.0
2009-11-17 16:14:00 0 20.4 4.9 11.7
当我重新采样时:
data = data.resample('30min', how ='sum')
data.head()
我得到:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 68861 entries, 2009-01-12 00:00:00 to 2012-12-16 14:00:00
Freq: 30T
Data columns (total 4 columns):
rain 45987 non-null values
temp 45987 non-null values
windspeed 45987 non-null values
dew_point 45987 non-null values
dtypes: float64(4)
2009-01-12 00:00:002 0 17.4 7.1 14.6
2009-01-12 00:30:00 0 17.4 7.2 14.7
2009-01-12 01:00:00 0 18.0 10.5 14.3
2009-01-12 01:30:00 0 18.3 9.6 14.2
2009-01-12 02:00:00 0 18.4 10.8 14.8
如您所见,我的初始日期是 2009-11-17 14:14:00,但重新采样日从 2009-01-12 开始。谁能解释发生这种情况?
编辑,我确实找到了问题,所以对于其他人来说,提供的数据集有:
2009-01-12 00:00:00 value
2009-01-12 00:30:00 value ... but the next line was!!!!!
2009-01-12 01:00 value
所以缺少 :00 秒让所有的混乱