This is a contrived example to keep the data generation easy, but in general this should be a problem applicable to a wide audience.
I have a time-series of measurements like so:
In [1]: import pandas as pd
In [2]: index = pd.date_range(start="18:10",periods=20,freq='min')
In [3]: df = pd.DataFrame(randn(20,3),columns=list('abc'),index=index)
In [4]: df.head()
Out[4]:
a b c
2013-02-27 18:10:00 -1.344753 0.438351 1.561849
2013-02-27 18:11:00 1.715643 1.601984 -0.027408
2013-02-27 18:12:00 -0.142264 -0.049462 0.482493
2013-02-27 18:13:00 0.132617 0.737902 -0.347620
2013-02-27 18:14:00 1.277257 0.083401 0.649422
In between the 'real' measurements, calibration measurements are being done, but at a much lesser frequency than the measurements, e.g. something like this:
In [5]: calindex = pd.date_range("18:12:30",periods=4,freq='5min')
In [6]: caldata = pd.Series([10,20,30,40],index = calindex)
In [7]: caldata
Out[7]:
2013-02-27 18:12:30 10
2013-02-27 18:17:30 20
2013-02-27 18:22:30 30
2013-02-27 18:27:30 40
Freq: 5T
The general idea now is to apply these calibration data to the measurements. For this, I would like to distribute / broadcast the calibration data by a 'closest-time' approach, so I would like to generate another column called 'offsets' for example, that has that calibration value in each row of the measurements that was determined closest in time to the time of each measurement value.
Therefore I am after an end result like this:
In [14]: df
Out[14]:
a b c offsets
2013-02-27 18:10:00 -1.344753 0.438351 1.561849 10
2013-02-27 18:11:00 1.715643 1.601984 -0.027408 10
2013-02-27 18:12:00 -0.142264 -0.049462 0.482493 10
2013-02-27 18:13:00 0.132617 0.737902 -0.347620 10
2013-02-27 18:14:00 1.277257 0.083401 0.649422 10
2013-02-27 18:15:00 0.048120 0.421220 0.149372 20
2013-02-27 18:16:00 0.812317 -1.517389 2.035487 20
2013-02-27 18:17:00 -0.058959 -0.034876 -1.535118 20
2013-02-27 18:18:00 -0.666227 0.040208 -1.042464 20
2013-02-27 18:19:00 -0.077031 -0.158351 -0.441992 20
2013-02-27 18:20:00 0.103083 -0.129341 0.294073 30
2013-02-27 18:21:00 0.900802 0.443271 -0.946229 30
2013-02-27 18:22:00 0.744631 -0.058666 -0.386226 30
2013-02-27 18:23:00 -0.064313 0.500321 -0.536237 30
2013-02-27 18:24:00 -0.392653 0.789827 0.000109 30
2013-02-27 18:25:00 1.926765 0.252259 -0.051475 40
2013-02-27 18:26:00 -0.035577 0.559222 -0.290751 40
2013-02-27 18:27:00 1.726165 0.626515 -0.868177 40
2013-02-27 18:28:00 1.269409 1.520980 -0.181637 40
2013-02-27 18:29:00 -1.151166 -0.300196 0.420747 40
The application of values into other columns via .map, .apply, etc. I believe to understand well, it is the apparently required time or offset trickery one needs to do for the distribution of the values that I don't have a clue what to start with.
Should it maybe be attacked with pandas.DateOffsets? Is there machinery to minimize time-deltas inside pandas somewhere?
I would appreciate a nudge into the right direction, doesn't have to be complete at all, just the direction where I need to be going.