I have two data sets that I am trying to compare. One is measured meteorological values which are measured approximately every 15 minutes, but not at a consistent time each hour (i.e. 12:03, 1:05, 2:01 etc.). The other data set is modelled data for the location on the hour exactly. I would like to extract the value from the measured data which is occurs closest to the hour mark to join with the modelled data.
I currently have both sets as a DataFrame format and have created an hourly time series to use as an index. Does anyone know of an easy way to align these without looping through all the data?
Thanks.
Using the df.resample('H', how='ohlc')
method, I get the following error:
Traceback (most recent call last):
File "<pyshell#81>", line 1, in <module>
df.resample('H', how='ohlc')
File "C:\Python33\lib\site-packages\pandas\core\generic.py", line 290, in resample
return sampler.resample(self)
File "C:\Python33\lib\site-packages\pandas\tseries\resample.py", line 83, in resample
rs = self._resample_timestamps(obj)
File "C:\Python33\lib\site-packages\pandas\tseries\resample.py", line 226, in _resample_timestamps
result = grouped.aggregate(self._agg_method)
File "C:\Python33\lib\site-packages\pandas\core\groupby.py", line 1695, in aggregate
return getattr(self, arg)(*args, **kwargs)
File "C:\Python33\lib\site-packages\pandas\core\groupby.py", line 427, in ohlc
return self._cython_agg_general('ohlc')
File "C:\Python33\lib\site-packages\pandas\core\groupby.py", line 1618, in _cython_agg_general
new_blocks = self._cython_agg_blocks(how, numeric_only=numeric_only)
File "C:\Python33\lib\site-packages\pandas\core\groupby.py", line 1656, in _cython_agg_blocks
result, _ = self.grouper.aggregate(values, how, axis=agg_axis)
File "C:\Python33\lib\site-packages\pandas\core\groupby.py", line 818, in aggregate
raise NotImplementedError
NotImplementedError
A sample of my dataframe is shown below:
D
2008-01-01 00:01:00 274.261108
2008-01-01 00:11:00 273.705566
2008-01-01 00:31:00 273.705566
2008-01-01 00:41:00 273.705566
2008-01-01 01:01:00 273.705566
2008-01-01 01:11:00 273.705566
2008-01-01 01:31:00 273.705566
2008-01-01 01:41:00 273.705566
2008-01-01 02:01:00 273.705566
2008-01-01 02:11:00 273.149994
EDIT: It appears this may be an error when using python 3.3. Can anyone confirm this?