I am trying to discretize my dataframe which looks like this:
Start Date | Park Duration (mins) | Charge Duration (mins) | Energy (kWh) | |
---|---|---|---|---|
49698 | 2016-01-01 11:48:00 | 230 | 92.0 | 3.034643 |
49710 | 2016-01-01 13:43:00 | 225 | 225.0 | 12.427662 |
49732 | 2016-01-01 22:43:00 | 708 | 111.0 | 10.752058 |
49736 | 2016-01-02 07:09:00 | 149 | 149.0 | 11.160776 |
49745 | 2016-01-02 10:29:00 | 156 | 156.0 | 10.298505 |
49758 | 2016-01-02 13:06:00 | 84 | 84.0 | 2.904127 |
49768 | 2016-01-02 15:00:00 | 27 | 26.0 | 2.573858 |
49773 | 2016-01-02 15:31:00 | 174 | 152.0 | 14.961943 |
49775 | 2016-01-02 16:01:00 | 195 | 167.0 | 16.317518 |
49790 | 2016-01-02 19:37:00 | 108 | 108.0 | 10.829344 |
49791 | 2016-01-02 19:56:00 | 289 | 26.0 | 2.552439 |
49802 | 2016-01-03 09:23:00 | 58 | 58.0 | 5.243358 |
49803 | 2016-01-03 09:33:00 | 264 | 134.0 | 6.782309 |
49813 | 2016-01-03 11:12:00 | 240 | 0.0 | 0.008115 |
49825 | 2016-01-03 14:12:00 | 97 | 96.0 | 5.29069 |
49833 | 2016-01-03 15:52:00 | 201 | 201.0 | 16.058235 |
49834 | 2016-01-03 15:52:00 | 53 | 52.0 | 5.304866 |
49840 | 2016-01-03 17:27:00 | 890 | 219.0 | 15.878921 |
49857 | 2016-01-04 05:57:00 | 198 | 127.0 | 6.368932 |
49871 | 2016-01-04 08:48:00 | 75 | 74.0 | 5.99877 |
What I want to do is to sample it in to 2 hour slots, like so:
Start Date | Energy (kWh) | Charge Duration (mins) | Fee |
---|---|---|---|
2016-01-01 10:00:00 | 3.034643 | 92.0 | 0.0 |
2016-01-01 12:00:00 | 12.427662 | 225.0 | 0.0 |
2016-01-01 14:00:00 | 0.0 | 0.0 | 0.0 |
2016-01-01 16:00:00 | 0.0 | 0.0 | 0.0 |
2016-01-01 18:00:00 | 0.0 | 0.0 | 0.0 |
2016-01-01 20:00:00 | 0.0 | 0.0 | 0.0 |
2016-01-01 22:00:00 | 10.752058 | 111.0 | 0.0 |
2016-01-02 00:00:00 | 0.0 | 0.0 | 0.0 |
2016-01-02 02:00:00 | 0.0 | 0.0 | 0.0 |
2016-01-02 04:00:00 | 0.0 | 0.0 | 0.0 |
Which i did with
data.resample('2H', on='Start Date').agg(({'Energy (kWh)':'sum','Charge Duration (mins)':'sum'}))
However the problem is that there is overspilling in the data, as you can see from the first row, the Charge Duration is 92 mins. however only 12 of those 92 minutes is in the 10:00:00 - 12:00:00 time slot, however the way i used resample assigned all of the charge duration to that time slot. The behaviour I want is to split them "evenly" in the timeslots based on Start Date and Charge Duration, such that 12 minutes fall in to the first slot and the remaining 80 falls into the next. There is also instances of EV chargings going over 3 periods. I hope it makes sense. How would you go about it?
Here is the original data as comma seperated values:
,Start Date,Park Duration (mins),Charge Duration (mins),Energy (kWh) 49698,2016-01-01 11:48:00, 230 ,92.0,3.034643 49710,2016-01-01 13:43:00, 225 ,225.0,12.427662 49732,2016-01-01 22:43:00, 708 ,111.0,10.752058 49736,2016-01-02 07:09:00, 149 ,149.0,11.160776 49745,2016-01-02 10:29:00, 156 ,156.0,10.298505 49758,2016-01-02 13:06:00, 84 ,84.0,2.904127 49768,2016-01-02 15:00:00, 27 ,26.0,2.573858 49773,2016-01-02 15:31:00, 174 ,152.0,14.961943 49775,2016-01-02 16:01:00, 195 ,167.0,16.317518 49790,2016-01-02 19:37:00, 108 ,108.0,10.829344 49791,2016-01-02 19:56:00, 289 ,26.0,2.552439 49802,2016-01-03 09:23:00, 58 ,58.0,5.243358 49803,2016-01-03 09:33:00, 264 ,134.0,6.782309 49813,2016-01-03 11:12:00, 240 ,0.0,0.008115 49825,2016-01-03 14:12:00, 97 ,96.0,5.29069 49833,2016-01-03 15:52:00, 201 ,201.0,16.058235 49834,2016-01-03 15:52:00, 53 ,52.0,5.304866 49840,2016-01-03 17:27:00, 890 ,219.0,15.878921 49857,2016-01-04 05:57:00, 198 ,127.0,6.368932 49871,2016-01-04 08:48:00, 75 ,74.0,5.99877