I have the following dataframe:
data = {
"date": {
0: "2019-02-01",
2: "2019-02-07",
3: "2019-02-15",
5: "2019-02-18",
12: "2019-03-02",
17: "2019-03-06",
19: "2019-03-13",
21: "2019-03-20",
},
"date_month_start": {
0: "2019-02-01",
2: "2019-02-01",
3: "2019-02-01",
5: "2019-02-01",
12: "2019-03-01",
17: "2019-03-01",
19: "2019-03-01",
21: "2019-03-01",
},
"account": {0: 67, 2: 69, 3: 67, 5: 67, 12: 67, 17: 67, 19: 67, 21: 69,},
"balance": {
0: 1705.65,
2: 1929.49,
3: 2004.46,
5: 2595.54,
12: 4428.41,
17: 2301.5,
19: 3089.82,
21: 3141.19,
},
"amount": {0: 0, 2: 0, 3: 0, 5: 0, 12: 0, 17: 0, 19: 0, 21: 0},
"category__name": {
0: "aaa",
2: "aaa",
3: "bbb",
5: "aaa",
12: "aaa",
17: "bbb",
19: "aaa",
21: "aaa",
},
}
df = pd.DataFrame(data)
df["date"] = pd.to_datetime(df["date"])
df["date_month_start"] = pd.to_datetime(df["date_month_start"])
df.sort_values('date', inplace=True)
Which results in:
date date_month_start account balance amount category__name
0 2019-02-01 2019-02-01 67 1705.65 0 aaa
2 2019-02-07 2019-02-01 69 1929.49 0 aaa
3 2019-02-15 2019-02-01 67 2004.46 0 bbb
5 2019-02-18 2019-02-01 67 2595.54 0 aaa
12 2019-03-02 2019-03-01 67 4428.41 0 aaa
17 2019-03-06 2019-03-01 67 2301.50 0 bbb
19 2019-03-13 2019-03-01 67 3089.82 0 aaa
21 2019-03-20 2019-03-01 69 3141.19 0 aaa
I need to determine the first date_month_start for each combination of account plus category__name. Then for each of those groups I need to set the amount of the last row to the balance.
The result will be:
date date_month_start account balance amount category__name
0 2019-02-01 2019-02-01 67 1705.65 0 aaa
2 2019-02-07 2019-02-01 69 1929.49 1929.49 aaa
3 2019-02-15 2019-02-01 67 2004.46 2004.46 bbb
5 2019-02-18 2019-02-01 67 2595.54 2595.54 aaa
12 2019-03-02 2019-03-01 67 4428.41 0 aaa
17 2019-03-06 2019-03-01 67 2301.50 0 bbb
19 2019-03-13 2019-03-01 67 3089.82 0 aaa
21 2019-03-20 2019-03-01 69 3141.19 0 aaa
In other words:
- The first
date_start_monthforaccount = 69andcategory__name = aaais2019-02-01. Set theamountfrom the last row of that group tobalance, ie:1929.49 - The first
date_start_monthforaccount = 67andcategory__name = bbbis2019-02-01. Set theamountfrom the last row of that group tobalance, ie:2004.46 - The first
date_start_monthforaccount = 67andcategory__name = aaais2019-02-01. Set theamountfrom the last row of that group tobalance, ie:2595.54
In this case the earliest date_month_start was the same in all cases, but that is not always so.