2

我有关于股票价格和交易量的数据,这些数据带有时间戳且间隔不规则,并且具有重复的时间索引。此类数据的一个简单示例是:

                       unixtime    price  amount
2011-04-17 01:03:11  1303002191  1.02570       1
2011-04-17 01:03:14  1303002194  1.02570       1
2011-04-17 01:03:17  1303002197  1.02570       1
2011-04-17 01:03:19  1303002199  1.02570       1
2011-04-17 01:03:21  1303002201  1.02570       1
2011-04-17 01:03:23  1303002203  1.02570       1
2011-04-17 01:03:37  1303002217  1.02570       1
2011-04-17 01:03:45  1303002225  1.02570       1
2011-04-17 01:03:57  1303002237  1.02570       1
2011-04-17 01:04:42  1303002282  1.02570       1
2011-04-17 01:04:55  1303002295  1.02570       1
2011-04-17 01:05:00  1303002300  1.02570       1
2011-04-17 01:05:03  1303002303  1.02570       1
2011-04-17 01:05:11  1303002311  1.02570       1
2011-04-17 01:05:24  1303002324  1.02570       1
2011-04-17 01:05:34  1303002334  1.02570       1
2011-04-17 01:05:45  1303002345  1.02570       1
2011-04-17 01:05:56  1303002356  1.02570       1
2011-04-17 01:06:11  1303002371  1.02570       1
2011-04-17 01:06:25  1303002385  1.02570       1
2011-04-17 01:06:28  1303002388  1.02570       1
2011-04-17 01:06:31  1303002391  1.02570       1
2011-04-17 01:06:33  1303002393  1.02570       1
2011-04-17 01:06:34  1303002394  1.02560       1
2011-04-17 01:06:44  1303002404  1.02560       1
2011-04-17 01:07:02  1303002422  1.02560       2
2011-04-17 01:07:21  1303002441  1.02563       2
2011-04-17 01:07:46  1303002466  1.02563       2
2011-04-17 01:08:24  1303002504  1.02563       2
2011-04-17 01:09:55  1303002595  1.02570       2
2011-04-17 01:10:50  1303002650  1.02570       2
2011-04-17 01:11:02  1303002662  1.02570       2

在这种情况下,我想要的是一个等距的系列,比如说 30 秒的频率和交易量(数量)加权的价格平均值。我已经能够分别使用how = "last" 和 "sum"获得等距的 30 秒间隔和该特定间隔的最后价格以及该间隔期间的总量(交易量) 。但是如何进行重采样以获得 30 秒间隔的成交量加权价格?

4

1 回答 1

4

我想我会为总销售额创建一个新列,并进行两次重新采样:

In [11]: df['total'] = df['price'] * df['amount']

In [12]: df.total.resample('30S', how='sum') / df.amount.resample('30S', how='sum')
Out[12]:
2011-04-17 01:03:00    1.025700
2011-04-17 01:03:30    1.025700
2011-04-17 01:04:00         NaN
2011-04-17 01:04:30    1.025700
2011-04-17 01:05:00    1.025700
2011-04-17 01:05:30    1.025700
2011-04-17 01:06:00    1.025700
2011-04-17 01:06:30    1.025650
2011-04-17 01:07:00    1.025615
2011-04-17 01:07:30    1.025630
2011-04-17 01:08:00    1.025630
2011-04-17 01:08:30         NaN
2011-04-17 01:09:00         NaN
2011-04-17 01:09:30    1.025700
2011-04-17 01:10:00         NaN
2011-04-17 01:10:30    1.025700
2011-04-17 01:11:00    1.025700
Freq: 30S, dtype: float64

假设这是你想要的东西......

于 2013-06-05T20:59:11.683 回答