0

I am trying to discretize my dataframe which looks like this:

Start Date Park Duration (mins) Charge Duration (mins) Energy (kWh)
49698 2016-01-01 11:48:00 230 92.0 3.034643
49710 2016-01-01 13:43:00 225 225.0 12.427662
49732 2016-01-01 22:43:00 708 111.0 10.752058
49736 2016-01-02 07:09:00 149 149.0 11.160776
49745 2016-01-02 10:29:00 156 156.0 10.298505
49758 2016-01-02 13:06:00 84 84.0 2.904127
49768 2016-01-02 15:00:00 27 26.0 2.573858
49773 2016-01-02 15:31:00 174 152.0 14.961943
49775 2016-01-02 16:01:00 195 167.0 16.317518
49790 2016-01-02 19:37:00 108 108.0 10.829344
49791 2016-01-02 19:56:00 289 26.0 2.552439
49802 2016-01-03 09:23:00 58 58.0 5.243358
49803 2016-01-03 09:33:00 264 134.0 6.782309
49813 2016-01-03 11:12:00 240 0.0 0.008115
49825 2016-01-03 14:12:00 97 96.0 5.29069
49833 2016-01-03 15:52:00 201 201.0 16.058235
49834 2016-01-03 15:52:00 53 52.0 5.304866
49840 2016-01-03 17:27:00 890 219.0 15.878921
49857 2016-01-04 05:57:00 198 127.0 6.368932
49871 2016-01-04 08:48:00 75 74.0 5.99877

What I want to do is to sample it in to 2 hour slots, like so:

Start Date Energy (kWh) Charge Duration (mins) Fee
2016-01-01 10:00:00 3.034643 92.0 0.0
2016-01-01 12:00:00 12.427662 225.0 0.0
2016-01-01 14:00:00 0.0 0.0 0.0
2016-01-01 16:00:00 0.0 0.0 0.0
2016-01-01 18:00:00 0.0 0.0 0.0
2016-01-01 20:00:00 0.0 0.0 0.0
2016-01-01 22:00:00 10.752058 111.0 0.0
2016-01-02 00:00:00 0.0 0.0 0.0
2016-01-02 02:00:00 0.0 0.0 0.0
2016-01-02 04:00:00 0.0 0.0 0.0

Which i did with

data.resample('2H', on='Start Date').agg(({'Energy (kWh)':'sum','Charge Duration (mins)':'sum'}))

However the problem is that there is overspilling in the data, as you can see from the first row, the Charge Duration is 92 mins. however only 12 of those 92 minutes is in the 10:00:00 - 12:00:00 time slot, however the way i used resample assigned all of the charge duration to that time slot. The behaviour I want is to split them "evenly" in the timeslots based on Start Date and Charge Duration, such that 12 minutes fall in to the first slot and the remaining 80 falls into the next. There is also instances of EV chargings going over 3 periods. I hope it makes sense. How would you go about it?

Here is the original data as comma seperated values:

,Start Date,Park Duration (mins),Charge Duration (mins),Energy (kWh) 49698,2016-01-01 11:48:00, 230 ,92.0,3.034643 49710,2016-01-01 13:43:00, 225 ,225.0,12.427662 49732,2016-01-01 22:43:00, 708 ,111.0,10.752058 49736,2016-01-02 07:09:00, 149 ,149.0,11.160776 49745,2016-01-02 10:29:00, 156 ,156.0,10.298505 49758,2016-01-02 13:06:00, 84 ,84.0,2.904127 49768,2016-01-02 15:00:00, 27 ,26.0,2.573858 49773,2016-01-02 15:31:00, 174 ,152.0,14.961943 49775,2016-01-02 16:01:00, 195 ,167.0,16.317518 49790,2016-01-02 19:37:00, 108 ,108.0,10.829344 49791,2016-01-02 19:56:00, 289 ,26.0,2.552439 49802,2016-01-03 09:23:00, 58 ,58.0,5.243358 49803,2016-01-03 09:33:00, 264 ,134.0,6.782309 49813,2016-01-03 11:12:00, 240 ,0.0,0.008115 49825,2016-01-03 14:12:00, 97 ,96.0,5.29069 49833,2016-01-03 15:52:00, 201 ,201.0,16.058235 49834,2016-01-03 15:52:00, 53 ,52.0,5.304866 49840,2016-01-03 17:27:00, 890 ,219.0,15.878921 49857,2016-01-04 05:57:00, 198 ,127.0,6.368932 49871,2016-01-04 08:48:00, 75 ,74.0,5.99877

4

1 回答 1

1

I don't see a fully straight forward approach to this. Have effectively built a dataframe for each row, and using ratios to split values across the target rows.

import io
df = pd.read_csv(io.StringIO("""    Start Date  Park Duration (mins)    Charge Duration (mins)  Energy (kWh)
49698   2016-01-01 11:48:00 230 92.0    3.034643
49710   2016-01-01 13:43:00 225 225.0   12.427662
49732   2016-01-01 22:43:00 708 111.0   10.752058
49736   2016-01-02 07:09:00 149 149.0   11.160776
49745   2016-01-02 10:29:00 156 156.0   10.298505
49758   2016-01-02 13:06:00 84  84.0    2.904127
49768   2016-01-02 15:00:00 27  26.0    2.573858
49773   2016-01-02 15:31:00 174 152.0   14.961943
49775   2016-01-02 16:01:00 195 167.0   16.317518
49790   2016-01-02 19:37:00 108 108.0   10.829344
49791   2016-01-02 19:56:00 289 26.0    2.552439
49802   2016-01-03 09:23:00 58  58.0    5.243358
49803   2016-01-03 09:33:00 264 134.0   6.782309
49813   2016-01-03 11:12:00 240 0.0 0.008115
49825   2016-01-03 14:12:00 97  96.0    5.29069
49833   2016-01-03 15:52:00 201 201.0   16.058235
49834   2016-01-03 15:52:00 53  52.0    5.304866
49840   2016-01-03 17:27:00 890 219.0   15.878921
49857   2016-01-04 05:57:00 198 127.0   6.368932
49871   2016-01-04 08:48:00 75  74.0    5.99877"""), sep="\t", index_col=0)

df["Start Date"] = pd.to_datetime(df["Start Date"])

def proportionalsplit(s, freq="2H"):
    st = s["Start Date"]
    et = st + pd.Timedelta(minutes=s["Charge Duration (mins)"])
    tr = pd.date_range(st.floor(freq), et, freq=freq)
    lmin = {"2H":120}
    # ratio of how numeric values should be split across new buckets
    ratio = np.minimum((np.where(tr<st, tr.shift()-st, et-tr)/(10**9*60)).astype(int), np.full(len(tr),lmin[freq]))
    ratio = ratio / ratio.sum()
    return {"Start Date":tr, "Original Duration":np.full(len(tr), s["Charge Duration (mins)"]), 
            "Original Start":np.full(len(tr), s["Start Date"]), 
            "Original Index": np.full(len(tr), s.name),
            "Charge Duration (mins)": s["Charge Duration (mins)"] * ratio,
            "Energy (kWh)": s["Energy (kWh)"] * ratio,
           }

df2 = pd.concat([pd.DataFrame(v) for v in df.apply(proportionalsplit, axis=1).values]).reset_index(drop=True)
# everything OK?
print(df2["Energy (kWh)"].sum().round(3)==df["Energy (kWh)"].sum().round(3), 
     df2["Charge Duration (mins)"].sum().round(3)==df["Charge Duration (mins)"].sum().round(3),)

# let's have a look at everything in 2H resample...
df3 = df2.groupby(["Start Date"]).agg({**{c:lambda s: list(s) for c in df2.columns if "Original" in c},
                                **{c:"sum" for c in ["Charge Duration (mins)","Energy (kWh)"]}})

output

                               Original Duration                                                                        Original Start                Original Index  Charge Duration (mins)  Energy (kWh)
Start Date                                                                                                                                                                                                
2016-01-01 10:00:00                       [92.0]                                                                 [2016-01-01 11:48:00]                       [49698]                    12.0      0.395823
2016-01-01 12:00:00                [92.0, 225.0]                                            [2016-01-01 11:48:00, 2016-01-01 13:43:00]                [49698, 49710]                    97.0      3.577799
2016-01-01 14:00:00                      [225.0]                                                                 [2016-01-01 13:43:00]                       [49710]                   120.0      6.628086
2016-01-01 16:00:00                      [225.0]                                                                 [2016-01-01 13:43:00]                       [49710]                    88.0      4.860597
2016-01-01 22:00:00                      [111.0]                                                                 [2016-01-01 22:43:00]                       [49732]                    77.0      7.458635
2016-01-02 00:00:00                      [111.0]                                                                 [2016-01-01 22:43:00]                       [49732]                    34.0      3.293423
2016-01-02 06:00:00                      [149.0]                                                                 [2016-01-02 07:09:00]                       [49736]                    51.0      3.820131
2016-01-02 08:00:00                      [149.0]                                                                 [2016-01-02 07:09:00]                       [49736]                    98.0      7.340645
2016-01-02 10:00:00                      [156.0]                                                                 [2016-01-02 10:29:00]                       [49745]                    91.0      6.007461
2016-01-02 12:00:00                [156.0, 84.0]                                            [2016-01-02 10:29:00, 2016-01-02 13:06:00]                [49745, 49758]                   119.0      6.157983
2016-01-02 14:00:00          [84.0, 26.0, 152.0]                       [2016-01-02 13:06:00, 2016-01-02 15:00:00, 2016-01-02 15:31:00]         [49758, 49768, 49773]                    85.0      6.465627
2016-01-02 16:00:00               [152.0, 167.0]                                            [2016-01-02 15:31:00, 2016-01-02 16:01:00]                [49773, 49775]                   239.0     23.439513
2016-01-02 18:00:00  [152.0, 167.0, 108.0, 26.0]  [2016-01-02 15:31:00, 2016-01-02 16:01:00, 2016-01-02 19:37:00, 2016-01-02 19:56:00]  [49773, 49775, 49790, 49791]                    78.0      7.684299
2016-01-02 20:00:00                [108.0, 26.0]                                            [2016-01-02 19:37:00, 2016-01-02 19:56:00]                [49790, 49791]                   107.0     10.682851
2016-01-03 08:00:00                [58.0, 134.0]                                            [2016-01-03 09:23:00, 2016-01-03 09:33:00]                [49802, 49803]                    64.0      4.711485
2016-01-03 10:00:00           [58.0, 134.0, 0.0]                       [2016-01-03 09:23:00, 2016-01-03 09:33:00, 2016-01-03 11:12:00]         [49802, 49803, 49813]                   128.0      7.322297
2016-01-03 14:00:00          [96.0, 201.0, 52.0]                       [2016-01-03 14:12:00, 2016-01-03 15:52:00, 2016-01-03 15:52:00]         [49825, 49833, 49834]                   112.0      6.745957
2016-01-03 16:00:00         [201.0, 52.0, 219.0]                       [2016-01-03 15:52:00, 2016-01-03 15:52:00, 2016-01-03 17:27:00]         [49833, 49834, 49840]                   197.0     16.468453
2016-01-03 18:00:00               [201.0, 219.0]                                            [2016-01-03 15:52:00, 2016-01-03 17:27:00]                [49833, 49840]                   193.0     14.532874
2016-01-03 20:00:00                      [219.0]                                                                 [2016-01-03 17:27:00]                       [49840]                    66.0      4.785428
2016-01-04 04:00:00                      [127.0]                                                                 [2016-01-04 05:57:00]                       [49857]                     3.0      0.150447
2016-01-04 06:00:00                      [127.0]                                                                 [2016-01-04 05:57:00]                       [49857]                   120.0      6.017889
2016-01-04 08:00:00                [127.0, 74.0]                                            [2016-01-04 05:57:00, 2016-01-04 08:48:00]                [49857, 49871]                    76.0      6.037237
2016-01-04 10:00:00                       [74.0]                                                                 [2016-01-04 08:48:00]                       [49871]                     2.0      0.162129
于 2021-02-19T16:02:50.073 回答