1

我有一些时间序列数据(金融股票交易数据):

TIMESTAMP    PRICE     VOLUME
1294311545    24990  1500000000
1294317813    25499  5000000000
1294318449    25499   100000000

我需要将它们转换为基于价格列的 OHLC 值(JSON 列表),即(打开、高、低、关闭),并将其显示为带有 h​​ighstock JS 框架的 OHLC 图。输出应如下所示:

[{'time':'2013-09-01','open':24999,'high':25499,'low':24999,'close':25000,'volume':15000000},
 {'time':'2013-09-02','open':24900,'high':25600,'low':24800,'close':25010,'volume':16000000},
 {...}]

例如,我的样本有 10 个当天2013-09-01的数据,输出将有一个当天的对象,它high是所有 10 个数据中的最高价格,low是最低价格,open是当天的第一个价格,是当天close的最后一个价格,volume应该是所有 10 个数据的总体积。

我知道有一个 python 库 pandas 可能可以做到这一点,但我仍然无法尝试。

更新:作为建议,我使用 resample() 作为:

df['VOLUME'].resample('H', how='sum')
df['PRICE'].resample('H', how='ohlc')

但是如何合并结果呢?

4

1 回答 1

0

目前您只能对列/系列执行 ohlc(将在 0.13中修复)。

首先,将 TIMESTAMP 列强制转换为 pandas Timestamp:

In [11]: df.TIMESTAMP = pd.to_datetime(df.TIMESTAMP, unit='s')

In [12]: df.set_index('TIMESTAMP', inplace=True)

In [13]: df
Out[13]:
                     PRICE      VOLUME
TIMESTAMP
2011-01-06 10:59:05  24990  1500000000
2011-01-06 12:43:33  25499  5000000000
2011-01-06 12:54:09  25499   100000000

通过 ohlc 重新采样(这里我按小时重新采样):

In [14]: df['VOLUME'].resample('H', how='ohlc')
Out[14]:
                           open        high         low       close
TIMESTAMP
2011-01-06 10:00:00  1500000000  1500000000  1500000000  1500000000
2011-01-06 11:00:00         NaN         NaN         NaN         NaN
2011-01-06 12:00:00  5000000000  5000000000   100000000   100000000

In [15]: df['PRICE'].resample('H', how='ohlc')
Out[15]:
                      open   high    low  close
TIMESTAMP
2011-01-06 10:00:00  24990  24990  24990  24990
2011-01-06 11:00:00    NaN    NaN    NaN    NaN
2011-01-06 12:00:00  25499  25499  25499  25499

您可以将to_json应用于任何 DataFrame:

In [16]: df['PRICE'].resample('H', how='ohlc').to_json()
Out[16]: '{"open":{"1294308000000000000":24990.0,"1294311600000000000":null,"1294315200000000000":25499.0},"high":{"1294308000000000000":24990.0,"1294311600000000000":null,"1294315200000000000":25499.0},"low":{"1294308000000000000":24990.0,"1294311600000000000":null,"1294315200000000000":25499.0},"close":{"1294308000000000000":24990.0,"1294311600000000000":null,"1294315200000000000":25499.0}}'

*这可能是对未实现的 DataFrame atm 的直接增强。

更新:从你想要的输出(或至少非常接近),可以实现如下:

In [21]: price = df['PRICE'].resample('D', how='ohlc').reset_index()

In [22]: price
Out[22]: 
            TIMESTAMP   open   high    low  close
0 2011-01-06 00:00:00  24990  25499  24990  25499

使用记录方向和 iso date_format:

In [23]: price.to_json(date_format='iso', orient='records')
Out[23]: '[{"TIMESTAMP":"2011-01-06T00:00:00.000Z","open":24990,"high":25499,"low":24990,"close":25499}]'

In [24]: price.to_json('foo.json', date_format='iso', orient='records')  # save as json file
于 2013-09-03T15:06:45.650 回答