8

我有一个包含两列的熊猫数据框:

ddf.head()

    a    b
0   3136 13280
1   3072 13312
2   3152 13296
3   3120 13248
4   3120 13200

我想计算同一列中连续元素之间的差异。现在,如果我一次只为一列(ddf['a'].diff())它会按我的预期工作,但如果我尝试ddf.diff()它会给出:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-68-6ff864856571> in <module>()
----> 1 ddf.diff()

/home/app/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in diff(self, periods)
   4285         diffed : DataFrame
   4286         """
-> 4287         new_data = self._data.diff(periods)
   4288         return self._constructor(new_data)
   4289 

/home/app/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in diff(self, *args, **kwargs)
   1287 
   1288     def diff(self, *args, **kwargs):
-> 1289         return self.apply('diff', *args, **kwargs)
   1290 
   1291     def interpolate(self, *args, **kwargs):

/home/app/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in apply(self, f, *args, **kwargs)
   1267                 applied = f(blk, *args, **kwargs)
   1268             else:
-> 1269                 applied = getattr(blk,f)(*args, **kwargs)
   1270 
   1271             if isinstance(applied,list):

/home/app/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in diff(self, n)
    423     def diff(self, n):
    424         """ return block for the diff of the values """
--> 425         new_values = com.diff(self.values, n, axis=1)
    426         return make_block(new_values, self.items, self.ref_items, fastpath=True)
    427 

/home/app/anaconda/lib/python2.7/site-packages/pandas/core/common.pyc in diff(arr, n, axis)
    643     if arr.ndim == 2 and arr.dtype.name in _diff_special:
    644         f = _diff_special[arr.dtype.name]
--> 645         f(arr, out_arr, n, axis)
    646     else:
    647         res_indexer = [slice(None)] * arr.ndim

/home/app/anaconda/lib/python2.7/site-packages/pandas/algos.so in pandas.algos.diff_2d_int16 (pandas/algos.c:91446)()

ValueError: Buffer dtype mismatch, expected 'float32_t' but got 'double'
4

1 回答 1

8

你可以使用这个:

>>> df - df.shift(1)
    a   b
0 NaN NaN
1 -64  32
2  80 -16
3 -32 -48
4   0 -48

但实际上,在我的机器上,df.diff()工作正常:

>>> df.diff()
    a   b
0 NaN NaN
1 -64  32
2  80 -16
3 -32 -48
4   0 -48
于 2013-11-12T21:18:48.103 回答