1

我对 pandas 有一个复杂的问题。我想根据时间戳 start_date 计算累积总和,这与我们有一个 end_date 有关,如果考虑到大于 1970 ,则从总和中减去。

样本数据


df = pd.DataFrame({'start_date': ['2014-09-18 14:46:58.563', '2015-04-18 07:10:31.365', '2014-09-18 14:46:58.563', '2014-12-18 08:41:32.466','2015-04-18 08:00:00.000'],'end_date': ['2015-04-18 07:10:31.364', '1970-01-01 00:00:00.000','1970-01-01 00:00:00.000','2015-04-18 07:10:31.518','1970-01-01 00:00:00.000'], 'value': [2300,2300, 2300,2300,2300], 'IDX' :[1,1,2,2,3] })
    start_date              end_date                value   IDX IDX_TOTAL
0   2014-09-18 14:46:58.563 2015-04-18 07:10:31.364 2300.0  1   1
1   2015-04-18 07:10:31.365 1970-01-01 00:00:00.000 2300.0  1   1
2   2014-09-18 14:46:58.563 1970-01-01 00:00:00.000 2300.0  2   1
3   2014-12-18 08:41:32.466 2015-04-18 07:10:31.518 2300.0  2   1
4   2015-04-18 08:00:00.000 1970-01-01 00:00:00.000 2300.0  3   1

我试过的:

df ["start_date"] = pd.to_datetime(df ["start_date"])

df .sort_values("start_date", inplace =True)

df ["start_date_2"] =  df ["start_date"]

df.groupby(['IDX_TOTAL', pd.Grouper(key='start_date_2', freq='m')])['value'].apply(lambda x:  x[-1]).cumsum()

我的期望:

   IDX_TOTAL   start_date         value
   1           2014-09-18 14:46   4600.0
               2014-12-18 8:41    4600.0
               2015-04-18 7:10    4600.0
               2015-04-18 8:00    6900.0
4

0 回答 0