我有一些 OHLCV 数据存储在 TimescaleDB 中,其中包含某些时间范围内的缺失数据。该数据需要重新采样到不同的时间段(即 1 天)并包含连续的、有序的时间段。
TimescaleDB 提供了time_bucket_gapfill
执行此操作的功能。我目前的查询是:
SELECT
time_bucket_gapfill(
'1 day',
"timestamp",
'2017-07-25 00:00',
'2018-01-01 00:00'
) as date,
FIRST(open, "timestamp") as open,
MAX(high) as high,
MIN(low) as low,
LAST(close, "timestamp") as close,
SUM(volume) as volume
FROM ohlcv
WHERE "timestamp" > '2017-07-25'
GROUP BY date ORDER BY date ASC LIMIT 10
结果是
date open high low close volume
2017-07-25 00:00:00+00
2017-07-26 00:00:00+00
2017-07-27 00:00:00+00 0.00992 0.010184 0.009679 0.010039 65553.5299999999
2017-07-28 00:00:00+00 0.00999 0.010059 0.009225 0.009248 43049.93
2017-07-29 00:00:00+00
2017-07-30 00:00:00+00 0.009518 0.0098 0.009286 0.009457 40510.0599999999
...
问题:看起来只有date
列被填空了。通过修改 SQL 语句,是否也可以对列open
、high
、low
、close
等进行间隙填充,volume
以便我们获得结果:
date open high low close volume
2017-07-25 00:00:00+00 0 0 0 0 0
2017-07-26 00:00:00+00 0 0 0 0 0
2017-07-27 00:00:00+00 0.00992 0.010184 0.009679 0.010039 65553.5299999999
2017-07-28 00:00:00+00 0.00999 0.010059 0.009225 0.009248 43049.93
2017-07-29 00:00:00+00 0.009248 0.009248 0.009248 0.009248 0
2017-07-30 00:00:00+00 0.009518 0.0098 0.009286 0.009457 40510.0599999999
...
或者是否建议在收到查询结果后执行此数据输入,例如在 Python/Nodejs 中?
如何使用 Python/pandas 完成的示例
更喜欢使用 TimescaleDB 而不是使用我的 Nodejs 应用程序来执行此间隙填充/输入,因为...使用 Nodejs 执行此操作会慢得多,我不想将 Python 引入应用程序只是为了执行此处理
import pandas as pd
# Building the test dataset simulating missing values after time_bucket
data = [
(pd.Timestamp('2020-01-01'), None, None, None, None, None),
(pd.Timestamp('2020-01-02'), 100, 110, 90, 95, 3),
(pd.Timestamp('2020-01-03'), None, None, None, None, None),
(pd.Timestamp('2020-01-04'), 98, 150, 100, 100, 4),
]
df = pd.DataFrame(data, columns=['date', 'open' , 'high', 'low', 'close', 'volume']).set_index('date')
# open high low close volume
# date
# 2020-01-01 NaN NaN NaN NaN NaN
# 2020-01-02 100.0 110.0 90.0 95.0 3.0
# 2020-01-03 NaN NaN NaN NaN NaN
# 2020-01-04 98.0 150.0 100.0 100.0 4.0
# Perform gap filling
df.close = df.close.fillna(method='ffill')
df.volume = df.volume.fillna(0) # fill missing volume with 0
df['open'] = df['open'].fillna(df['close']) # fill missing open by forward-filling close
df['high'] = df['high'].fillna(df['close']) # fill missing high by forward-filling close
df['low'] = df['low'].fillna(df['close']) # fill missing low by forward-filling close
df = df.fillna(0) # fill missing OHLC with 0 if no previous values available
# open high low close volume
# date
# 2020-01-01 0.0 0.0 0.0 0.0 0.0
# 2020-01-02 100.0 110.0 90.0 95.0 3.0
# 2020-01-03 95.0 95.0 95.0 95.0 0.0
# 2020-01-04 98.0 150.0 100.0 100.0 4.0