12

I have a multi-index dataframe in pandas, where index is on ID and timestamp. I want to be able to compute a time-series rolling sum of each ID but I can't seem to figure out how to do it without loops.

content = io.BytesIO("""\
IDs    timestamp     value
0      2010-10-30     1
0      2010-11-30     2
0      2011-11-30     3
1      2000-01-01     300
1      2007-01-01     33
1      2010-01-01     400
2      2000-01-01     11""")
df = pd.read_table(content, header=0, sep='\s+', parse_dates=[1])
df.set_index(['IDs', 'timestamp'], inplace=True)
pd.stats.moments.rolling_sum(df,window=2

And the output for this is:

                value
IDs timestamp
0   2010-10-30    NaN
    2010-11-30      3
    2011-11-30      5
1   2000-01-01    303
    2007-01-01    333
    2010-01-01    433
2   2000-01-01    411

Notice the overlap between IDs 0 and 1 and 1 and 2 at the edges (I don't want that, messes up my calculations). One possible way to get around this is to use groupby on IDs and then loop through that groupby and then apply a rolling_sum.

I am sure there is a function to help me do this without using loops.

4

1 回答 1

15

Group first, then roll the sum (also rolling_sum is available in the top-level namespace)

In [18]: df.groupby(level='IDs').apply(lambda x: pd.rolling_sum(x,2))
Out[18]: 
                value
IDs timestamp        
0   2010-10-30    NaN
    2010-11-30      3
    2011-11-30      5
1   2000-01-01    NaN
    2007-01-01    333
    2010-01-01    433
2   2000-01-01    NaN
于 2013-10-04T18:30:53.187 回答