我正在尝试计算 DataFrame 中两列之间的滚动 beta。
我自己解释一下,beta(金融)经典地回答了以下公式:cov(asset_1,asset_2)/var(asset_2)。对于我的情况,我想使用卡尔曼滤波器计算滚动 beta。然后用经典的 beta 公式初始化这个卡尔曼滤波器。
我的问题如下:我有一个包含两列(asset_1 和asset_2)的数据框。还有一个 beta_kalman(s1,s2) 函数(使用 pykalman)。
beta kalman 函数将对应于 s1=asset_1 和 s2=asset_2 的两个系列作为参数。
目前,我通过一个简单的方法通过 for 循环通过在我的函数中提供窗口上每个资产的相应值来在每个增量处移动 1 的索引。
问题是执行时间太长。
他有办法减少这个计算时间吗?我在想象类似的东西: df.rolling(window).apply(beta_kalman) 但这不起作用。
你有一个解决方案的想法吗?
注意:我必须在 python 3.6.3 和 pandas 0.20.3 下实现这个
提前感谢您的想法!
示例数据,以及重新创建以下问题所需的最少代码:
from pandas import Timestamp
import pandas as pd
import numpy as np
from pykalman import KalmanFilter
我用 pykalman 计算 beta 的函数:
def beta_kalman(s1,s2, delta=1e-2):
beta_init = (np.cov(s1, s2)[0, 1] / np.var(s2))
trans_cov = delta / (1 - delta) * np.eye(2)
obs_mat = np.vstack([s2, np.ones(s2.shape)]).T[:, np.newaxis]
kf = KalmanFilter(n_dim_obs=1, # 1-D
n_dim_state=2, # 2-D
initial_state_mean=[beta_init, 0],
initial_state_covariance=np.ones((2, 2)),
transition_matrices=np.eye(2),
observation_matrices=obs_mat,
observation_covariance=2,
transition_covariance=trans_cov,
)
state_means, _ = kf.filter(s1.values)
return state_means[:, 0]
样本数据 :
df_returns = {'asset_1': {Timestamp('2015-02-02 00:00:00', freq='B'): -0.00065638527967171179,
Timestamp('2015-02-03 00:00:00', freq='B'): 0.0022343530982782411,
Timestamp('2015-02-04 00:00:00', freq='B'): 0.00047087917232135901,
Timestamp('2015-02-05 00:00:00', freq='B'): 0.00068940734601552478,
Timestamp('2015-02-06 00:00:00', freq='B'): 0.001155443533138456,
Timestamp('2015-02-09 00:00:00', freq='B'): -0.00073878429513896116,
Timestamp('2015-02-10 00:00:00', freq='B'): 6.5331180920669141e-06,
Timestamp('2015-02-11 00:00:00', freq='B'): -0.00047848662447047552,
Timestamp('2015-02-12 00:00:00', freq='B'): 0.00075100030101071802,
Timestamp('2015-02-13 00:00:00', freq='B'): 0.0011705611535068883,
Timestamp('2015-02-16 00:00:00', freq='B'): 0.00051393092538964957,
Timestamp('2015-02-17 00:00:00', freq='B'): -0.00048847349932235051,
Timestamp('2015-02-18 00:00:00', freq='B'): 0.0012106608634878668,
Timestamp('2015-02-19 00:00:00', freq='B'): 0.0013241124699925333,
Timestamp('2015-02-20 00:00:00', freq='B'): 0.00071000350611760688,
Timestamp('2015-02-23 00:00:00', freq='B'): 0.0018171290896187298,
Timestamp('2015-02-24 00:00:00', freq='B'): 0.00239364252208496,
Timestamp('2015-02-25 00:00:00', freq='B'): 0.0015992532863815523,
Timestamp('2015-02-26 00:00:00', freq='B'): 0.0019965436705504658,
Timestamp('2015-02-27 00:00:00', freq='B'): 0.0011555193318930623},
'asset_2': {Timestamp('2015-02-02 00:00:00', freq='B'): 0.0055712469218620608,
Timestamp('2015-02-03 00:00:00', freq='B'): 0.01307503061081472,
Timestamp('2015-02-04 00:00:00', freq='B'): 0.0003952002997402726,
Timestamp('2015-02-05 00:00:00', freq='B'): -0.0017481486478068131,
Timestamp('2015-02-06 00:00:00', freq='B'): -0.0031670739060284392,
Timestamp('2015-02-09 00:00:00', freq='B'): -0.014835535182983417,
Timestamp('2015-02-10 00:00:00', freq='B'): 0.010569586765601269,
Timestamp('2015-02-11 00:00:00', freq='B'): -0.002657959034321089,
Timestamp('2015-02-12 00:00:00', freq='B'): 0.012883068432518074,
Timestamp('2015-02-13 00:00:00', freq='B'): 0.008773174815372986,
Timestamp('2015-02-16 00:00:00', freq='B'): -0.0041451490599345719,
Timestamp('2015-02-17 00:00:00', freq='B'): 0.0014955867933237332,
Timestamp('2015-02-18 00:00:00', freq='B'): 0.0079578196824314773,
Timestamp('2015-02-19 00:00:00', freq='B'): 0.0064298048361444149,
Timestamp('2015-02-20 00:00:00', freq='B'): 0.0007021736832582004,
Timestamp('2015-02-23 00:00:00', freq='B'): 0.0083229889997538109,
Timestamp('2015-02-24 00:00:00', freq='B'): 0.007817259530292997,
Timestamp('2015-02-25 00:00:00', freq='B'): -0.001499671169154615,
Timestamp('2015-02-26 00:00:00', freq='B'): 0.0093629482797668029,
Timestamp('2015-02-27 00:00:00', freq='B'): 0.0067304325978827517}}
df_returns = pd.DataFrame.from_dict(df_returns)
我实际上在做什么,这很有效,但是很长:
df_betas = pd.DataFrame(index=df_returns.index, columns=['BETA'])
window=5
for i in range(len(df_returns.index) - window+1):
temp_returns = df_returns[['asset_1', 'asset_2']].iloc[i:i + window].copy()
df_betas.loc[temp_returns.index[-1], 'BETA'] = beta_kalman(temp_returns['asset_1'], temp_returns['asset_2'])[-1]
我想做的事 :
df_betas2 = df_returns.copy()
df_betas2 = df_betas2.rolling(window).apply(beta_kalman)
我得到的错误:
TypeError: beta_kalman() missing 1 required positional argument: 's2'
这个错误是合乎逻辑的,因为 apply 只有一个参数传递给 beta_kalman 函数。但问题是通过 apply 作为参数传递的内容对应于第一列 (asset_1) 值的数组 (shape=(5,)),而不是两列。