我对 pandas 有一个复杂的问题。我想根据时间戳 start_date 计算累积总和,这与我们有一个 end_date 有关,如果考虑到大于 1970 ,则从总和中减去。
样本数据
df = pd.DataFrame({'start_date': ['2014-09-18 14:46:58.563', '2015-04-18 07:10:31.365', '2014-09-18 14:46:58.563', '2014-12-18 08:41:32.466','2015-04-18 08:00:00.000'],'end_date': ['2015-04-18 07:10:31.364', '1970-01-01 00:00:00.000','1970-01-01 00:00:00.000','2015-04-18 07:10:31.518','1970-01-01 00:00:00.000'], 'value': [2300,2300, 2300,2300,2300], 'IDX' :[1,1,2,2,3] })
start_date end_date value IDX IDX_TOTAL
0 2014-09-18 14:46:58.563 2015-04-18 07:10:31.364 2300.0 1 1
1 2015-04-18 07:10:31.365 1970-01-01 00:00:00.000 2300.0 1 1
2 2014-09-18 14:46:58.563 1970-01-01 00:00:00.000 2300.0 2 1
3 2014-12-18 08:41:32.466 2015-04-18 07:10:31.518 2300.0 2 1
4 2015-04-18 08:00:00.000 1970-01-01 00:00:00.000 2300.0 3 1
我试过的:
df ["start_date"] = pd.to_datetime(df ["start_date"])
df .sort_values("start_date", inplace =True)
df ["start_date_2"] = df ["start_date"]
df.groupby(['IDX_TOTAL', pd.Grouper(key='start_date_2', freq='m')])['value'].apply(lambda x: x[-1]).cumsum()
我的期望:
IDX_TOTAL start_date value
1 2014-09-18 14:46 4600.0
2014-12-18 8:41 4600.0
2015-04-18 7:10 4600.0
2015-04-18 8:00 6900.0