0

我需要根据连续几天采取的措施总和进行一些计算。例如:

import pandas as pd
from pandas import Series
rng = pd.date_range('1/3/2000', periods=8)
rng = rng[:4].append(rng[5:])
ts = Series(randn(7).astype('int'), index=rng)
ts

Out[1]:
2000-01-03    0
2000-01-04    0
2000-01-05    0
2000-01-06   -1
2000-01-08    0
2000-01-09   -2
2000-01-10   -1
dtype: int64

我怎么能在这里总结连续的天值,所以我会得到这样的东西?

Out[2]:
2000-01-03   -1
2000-01-04   -1
2000-01-05   -1
2000-01-06   -1
2000-01-08   -3
2000-01-09   -3
2000-01-10   -3
dtype: int64

[编辑] R中解决的类似问题

4

1 回答 1

1

现在我找到了答案,问题似乎更简单了:

def ranks(series):
    """
    In an ORDERED series, this function identifies consecutive days
    giving each group an unique number identifier. Argument must be
    a pandas Series with datetime index.
    """
    td = series.index.to_series().diff()
    td[0] = timedelta64(1, 'D')
    res = []
    counter = 0
    for i in range(td.size):
        if td[i] > timedelta64(1, 'D'):
            counter += 1
        res.append(counter)
    return(Series(res, index=series.index))

从这里开始,pandas groupby会处理它。;-)

df = DataFrame({'val':ts, 'gr':ranks(ts)})
gr = DataFrame({'val':ts, 'gr':ranks(ts)}).groupby('gr')
df.merge(gr.sum(), left_on='gr', right_index=True, how='outer')

在此处输入图像描述

于 2013-09-11T05:50:24.720 回答