0

当我对某些数据进行重新采样时,我遇到了 pandas 丢弃第一行的问题。请看下面的例子。请注意,如果您将最后一个时间戳向前推进 1 秒,它会按预期工作。

我正在使用熊猫 0.10.1

import pandas as pd

from datetime import datetime
from StringIO import StringIO


f = StringIO('''\
time,value
2011-06-03 00:00:05,0
2011-06-03 00:01:05,1
2011-06-03 00:02:05,2
''')

series = pd.read_csv(f, parse_dates=True, index_col=0)['value']

print series
# time
# 2011-06-03 00:00:05    0
# 2011-06-03 00:01:05    1
# 2011-06-03 00:02:05    2
# Name: value

# Problem resampling: 1st sample is missing

print series.resample('s')
# time
# 2011-06-03 00:00:06   NaN
# 2011-06-03 00:00:07   NaN
# 2011-06-03 00:00:08   NaN
# 2011-06-03 00:00:09   NaN
# ...
# 2011-06-03 00:01:52   NaN
# 2011-06-03 00:02:03   NaN
# 2011-06-03 00:02:04   NaN
# 2011-06-03 00:02:05     2
# 2011-06-03 00:02:06   NaN
# Freq: S, Name: value, Length: 121
4

1 回答 1

0

封闭参数的默认值在 0.11 中已更改,请参见此处。我不知道那里是否也有错误。您可以尝试指定封闭区间。

当前的 pandas 版本是 0.12(0.13 即将推出)。最好的办法是升级。

从 0.12 开始。看起来还可以。默认关闭='left'

In [11]: df
Out[11]: 
                     value
time                      
2011-06-03 00:00:05      0
2011-06-03 00:01:05      1
2011-06-03 00:02:05      2

In [12]: df.index
Out[12]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-06-03 00:00:05, ..., 2011-06-03 00:02:05]
Length: 3, Freq: None, Timezone: None

In [13]: df.resample('1s')
Out[13]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 121 entries, 2011-06-03 00:00:05 to 2011-06-03 00:02:05
Freq: S
Data columns (total 1 columns):
value    3  non-null values
dtypes: float64(1)

In [14]: df.resample('1s').head()
Out[14]: 
                     value
time                      
2011-06-03 00:00:05      0
2011-06-03 00:00:06    NaN
2011-06-03 00:00:07    NaN
2011-06-03 00:00:08    NaN
2011-06-03 00:00:09    NaN
于 2013-09-27T12:07:48.483 回答