I have a multi-index dataframe in pandas, where index is on ID and timestamp. I want to be able to compute a time-series rolling sum of each ID but I can't seem to figure out how to do it without loops.
content = io.BytesIO("""\
IDs timestamp value
0 2010-10-30 1
0 2010-11-30 2
0 2011-11-30 3
1 2000-01-01 300
1 2007-01-01 33
1 2010-01-01 400
2 2000-01-01 11""")
df = pd.read_table(content, header=0, sep='\s+', parse_dates=[1])
df.set_index(['IDs', 'timestamp'], inplace=True)
pd.stats.moments.rolling_sum(df,window=2
And the output for this is:
value
IDs timestamp
0 2010-10-30 NaN
2010-11-30 3
2011-11-30 5
1 2000-01-01 303
2007-01-01 333
2010-01-01 433
2 2000-01-01 411
Notice the overlap between IDs 0 and 1 and 1 and 2 at the edges (I don't want that, messes up my calculations). One possible way to get around this is to use groupby on IDs and then loop through that groupby and then apply a rolling_sum.
I am sure there is a function to help me do this without using loops.