您将需要 0.11-dev。我认为这会给你你正在寻找的东西。请参阅本节:http : //pandas.pydata.org/pandas-docs/dev/timeseries.html#time-deltas 了解更多信息,因为 timedeltas 是 pandas 支持的较新数据
这是您的数据(为了方便,我将 long/lat 分开,关键是条件列是 bool)
In [137]: df = pd.read_csv(StringIO.StringIO(data),index_col=0,parse_dates=True)
In [138]: df
Out[138]:
date long lat condition
2013-02-05 19:45:00 39.940 -86.159 True
2013-02-05 19:50:00 39.940 -86.159 True
2013-02-05 19:55:00 39.940 -86.159 False
2013-02-05 20:00:00 39.777 -85.995 False
2013-02-05 20:05:00 39.775 -85.978 True
2013-02-05 20:10:00 39.775 -85.978 True
2013-02-05 20:15:00 39.775 -85.978 False
2013-02-05 20:20:00 39.940 -86.159 True
2013-02-05 20:25:00 39.940 -86.159 False
In [139]: df.dtypes
Out[139]:
date float64
long lat float64
condition bool
dtype: object
创建一些作为索引的日期列(这些是 datetime64[ns] dtype)
In [140]: df['date'] = df.index
In [141]: df['rdate'] = df.index
将 False 的 rdate 列设置为 NaT(np.nan 转换为 NaT)
In [142]: df.loc[~df['condition'],'rdate'] = np.nan
从前一个值向前填充 NaT
In [143]: df['rdate'] = df['rdate'].ffill()
从日期中减去 rdate,这将生成 timedelta64[ns] 类型的时间差列
In [144]: df['diff'] = df['date']-df['rdate']
In [151]: df
Out[151]:
date long lat condition rdate \
2013-02-05 19:45:00 2013-02-05 19:45:00 -86.159 True 2013-02-05 19:45:00
2013-02-05 19:50:00 2013-02-05 19:50:00 -86.159 True 2013-02-05 19:50:00
2013-02-05 19:55:00 2013-02-05 19:55:00 -86.159 False 2013-02-05 19:50:00
2013-02-05 20:00:00 2013-02-05 20:00:00 -85.995 False 2013-02-05 19:50:00
2013-02-05 20:05:00 2013-02-05 20:05:00 -85.978 True 2013-02-05 20:05:00
2013-02-05 20:10:00 2013-02-05 20:10:00 -85.978 True 2013-02-05 20:10:00
2013-02-05 20:15:00 2013-02-05 20:15:00 -85.978 False 2013-02-05 20:10:00
2013-02-05 20:20:00 2013-02-05 20:20:00 -86.159 True 2013-02-05 20:20:00
2013-02-05 20:25:00 2013-02-05 20:25:00 -86.159 False 2013-02-05 20:20:00
diff
2013-02-05 19:45:00 00:00:00
2013-02-05 19:50:00 00:00:00
2013-02-05 19:55:00 00:05:00
2013-02-05 20:00:00 00:10:00
2013-02-05 20:05:00 00:00:00
2013-02-05 20:10:00 00:00:00
2013-02-05 20:15:00 00:05:00
2013-02-05 20:20:00 00:00:00
2013-02-05 20:25:00 00:05:00
diff 列现在是 timedelta64[ns],因此您需要以分钟为单位的整数(仅供参考,现在这有点笨拙,因为 pandas 没有类似于日期时间戳的标量类型 Timedelta)
(另外,在填充之前,您可能必须在这个 rdate 系列上做一个 shift(),我想我在某个地方偏离了 1)......但这就是想法
In [175]: df['diff'].map(lambda x: x.item().seconds/60)
Out[175]:
2013-02-05 19:45:00 0
2013-02-05 19:50:00 0
2013-02-05 19:55:00 5
2013-02-05 20:00:00 10
2013-02-05 20:05:00 0
2013-02-05 20:10:00 0
2013-02-05 20:15:00 5
2013-02-05 20:20:00 0
2013-02-05 20:25:00 5