3

我有一堆刻度数据,我可以使用以下方法成功地将它们重新采样为时间数据:

h5_file = pd.HDFStore(h5_path)
h5_file['fx_data'].groupby('Symbol')
ask = grouped['Ask'].resample('5Min', how='ohlc')
bid = grouped['Bid'].resample('5Min', how='ohlc')

但我也想返回报价量。这应该只是对组成每个样本的行数的计数。怎样才能最好地做到这一点?

另外 - 当我选择用较小的时间范围重新采样时,偶尔会出现数值为 N/A 的柱状图,因为该期间没有价格变化。发生这种情况时,我希望之前的收盘价是当前柱上 OHLC 的值。

我搜索并找到了这段代码:

whatev.groupby('Symbol')closes = resampledData['close'].fillna(method='pad')
resampledData.apply(lambda x: x.fillna(closes)

我对 Python 和编程非常陌生,还不了解lambas。这只会改变接近的值还是我需要改变的所有值。非常感谢所有帮助。

4

1 回答 1

0

我在 hdf5 中获得了部分示例外汇数据(2015 年 5 月的美元/欧元)的副本,因此我将在此处使用它来进行说明。

import pandas as pd

Jian_h5 = '/media/Primary Disk/Jian_Python_Data_Storage.h5'
h5_file = pd.HDFStore(Jian_h5)  

fx_df = h5_file['fx_tick_data']
# I've only got USD/EUR in this dataset, but let's still do a groupby symbol
# and assume you have multiple symbols
grouped = fx_df.groupby('Symbol')

# calculate sub-group average bid and ask price, and also number of ticks
freq = '1min'
# an empty DataFrame
result = pd.DataFrame()
# bid/ask price: forward fill make sense
result['avg_bid'] = grouped['Bid'].resample(freq, how='mean').fillna(method='ffill')
result['avg_ask'] = grouped['Ask'].resample(freq, how='mean').fillna(method='ffill')
# tick count: NaN should be replaced by zero
result['tick_counts'] = grouped['Ask'].resample(freq, how='count').fillna(0)

Out[59]: 
                             avg_bid  avg_ask  tick_counts
Symbol  Date_time                                         
EUR/USD 2015-05-01 00:00:00   1.1210   1.1210           77
        2015-05-01 00:01:00   1.1209   1.1210          117
        2015-05-01 00:02:00   1.1209   1.1210           95
        2015-05-01 00:03:00   1.1210   1.1210           46
        2015-05-01 00:04:00   1.1211   1.1211          112
        2015-05-01 00:05:00   1.1213   1.1213          193
        2015-05-01 00:06:00   1.1214   1.1215           76
        2015-05-01 00:07:00   1.1216   1.1216          103
        2015-05-01 00:08:00   1.1216   1.1217          107
        2015-05-01 00:09:00   1.1217   1.1217           17
        2015-05-01 00:10:00   1.1216   1.1217           33
        2015-05-01 00:11:00   1.1218   1.1218           56
        2015-05-01 00:12:00   1.1217   1.1218           77
        2015-05-01 00:13:00   1.1215   1.1215           18
        2015-05-01 00:14:00   1.1215   1.1216           50
...                              ...      ...          ...
        2015-05-31 23:45:00   1.0959   1.0960           37
        2015-05-31 23:46:00   1.0959   1.0959           59
        2015-05-31 23:47:00   1.0958   1.0959           62
        2015-05-31 23:48:00   1.0956   1.0957           45
        2015-05-31 23:49:00   1.0955   1.0956           67
        2015-05-31 23:50:00   1.0955   1.0956           36
        2015-05-31 23:51:00   1.0955   1.0956           35
        2015-05-31 23:52:00   1.0956   1.0956           22
        2015-05-31 23:53:00   1.0956   1.0957           29
        2015-05-31 23:54:00   1.0957   1.0958           50
        2015-05-31 23:55:00   1.0956   1.0957           30
        2015-05-31 23:56:00   1.0957   1.0958            8
        2015-05-31 23:57:00   1.0957   1.0958           45
        2015-05-31 23:58:00   1.0957   1.0958           38
        2015-05-31 23:59:00   1.0958   1.0958           30

[44640 rows x 3 columns]
于 2015-07-02T19:15:31.103 回答