-1

在以下示例中:

import datetime
import pandas

base = datetime.datenow()
rr = [base - datetime.timedelta(days=x) for x in range(23)]
ee = [base - datetime.timedelta(days=x+3) for x in range(23)]
qq = pandas.DataFrame(data=rr, index=ee, columns=['datacol'])

qq.index - qq.datacol.values

最后一行引发了一个 TypeError:

In [11]: qq.index-qq.datacol.values
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-e850e726adac> in <module>()
----> 1 qq.index-qq.datacol.values

/usr/local/lib/python2.7/dist-packages/pandas/tseries/index.pyc in __sub__(self, other)
    556             return self.shift(-other)
    557         else:  # pragma: no cover
--> 558             raise TypeError(other)
    559 
    560     def _add_delta(self, delta):

TypeError: ['2013-11-08T21:18:50.478689000-0800' '2013-11-07T21:18:50.478689000-0800'

如何获得索引和列之间的差异?

注意:来自日期时间对象,但索引自动成为时间戳。

4

1 回答 1

1

这是一个演示您的问题的示例:

In [11]: rng = pd.date_range('2012-01-01', '2012-01-06')

In [12]: df = pd.DataFrame(rng, rng + 10)

In [13]: df
Out[13]: 
                             0
2012-01-11 2012-01-01 00:00:00
2012-01-12 2012-01-02 00:00:00
2012-01-13 2012-01-03 00:00:00
2012-01-14 2012-01-04 00:00:00
2012-01-15 2012-01-05 00:00:00
2012-01-16 2012-01-06 00:00:00

您可以直接在 numpy 中执行差异(索引和列 0):

In [14]: df.index.values - df[0].values
Out[14]: 
array([864000000000000, 864000000000000, 864000000000000, 864000000000000,
       864000000000000, 864000000000000], dtype='timedelta64[ns]')

并将其转换为系列:

In [15]: pd.Series(df.index.values - df[0].values)
Out[15]: 
0   10 days, 00:00:00
1   10 days, 00:00:00
2   10 days, 00:00:00
3   10 days, 00:00:00
4   10 days, 00:00:00
5   10 days, 00:00:00
dtype: timedelta64[ns]

老实说,我觉得这部分 pandas(timedeltas)目前正在改进,所以也许以后的版本会有更好的方法……

于 2013-11-08T20:26:01.710 回答