下面的大部分代码都是从各种来源“收集”的。最终目标是将股票的收盘价定义为尽可能快的时间范围内的均值回归或趋势。希望这个问题和答案可以作为对该主题的全面了解。不幸的是,输出让我感到困惑/冲突。
第 1 部分:基于 p 值,这看起来是趋势。基于临界值,这看起来意味着恢复。
第 2 部分:所有趋势,除了 50 的滞后。为什么这个均值回归?
第 3 部分:所有趋势。这些 p 值无处不在。如何知道使用哪种“类型”?是否需要分析原始数据图以查看趋势?如果这是趋势,这不是告诉我们的全部意义吗?
第 4 部分:所有趋势,除了 (lag, autolag) = (20, t-stat), (30, t-stat), (40, t-stat), (50, AIC)。为什么?
第 5 节:强烈均值回归。所有基准看起来都很接近。
第 6 节:为什么有多个半衰期值?哪一个是对的?
谢谢。
输入:
import pandas as pd
import numpy as np
import statsmodels.tsa.stattools as ts # time series library
from statsmodels.tsa.stattools import adfuller
import pandas_datareader.data as wb
import statsmodels.api as sm
from numpy import cumsum, log, polyfit, sqrt, std, subtract
from numpy.random import randn
#............................
df = wb.DataReader('GE','google', '2003, 5, 1', '2003, 12, 15')
print ('\n....... section 1 - adf statistic ......................')
name = pd.Series (df['Close'].values)
result = adfuller (name)
print('close - adf statistic: %f' % result[0] + ', p value: %f' % result[1])
if result[1] > 0.05:
print ('trending based on p value > 0.05')
else:
print ('mean reverting based on p value < 0.05')
print('critical values:')
for key, value in result[4].items():
print('\t%s: %.3f' % (key, value))
print ('if adf statistic > % test statistic, then trending at that %')
print ('\n..... section 2 - different lag times ...................')
lag_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60]
for lag in lag_list:
name = pd.Series (df['Close'].values)
result = adfuller (name, maxlag = lag)
if result[1] > 0.05:
print ('p-value: %f' % result[1] + ' trending based on lag = ' + str(lag))
else:
print ('p-value: %f' % result[1] + ' mean reverting based on lag = ' + str(lag))
print ('\n..... section 3 - different regression types ..........')
regress_list = ['c','ct','ctt','nc']
for regress in regress_list:
name = pd.Series (df['Close'].values)
result = adfuller (name, regression = regress)
if regress == 'c':
print ('regression for close with ' + regress + ' = constant only, p-value: %f' % result[1])
if regress == 'ct':
print ('\nregression for close with ' + regress + ' = constant and trend, p-value: %f' % result[1])
if regress == 'ctt':
print ('\nregression for close with ' + regress + ' = constant and linear and quadratic trend, p-value: %f' % result[1])
if regress == 'nc':
print ('\nregression for close with ' + regress + ' = no constant, no trend, p-value: %f' % result[1])
if result[1] > 0.05:
print ('trending based on p value > 0.05')
else:
print ('mean reverting based on p value < 0.05')
print ('\n......... section 4 - different lag types')
alag_list = ['AIC', 'BIC', 't-stat']
for lag in lag_list:
for alag in alag_list:
name = pd.Series (df['Close'].values)
result = adfuller (name, maxlag = lag, autolag = alag)
print ('\nlag = ' + str(lag) + ', autolag = ' + alag + ', p-value = %f' % result[1])
if result[1] > 0.05:
print ('trending based on p value > 0.05')
else:
print ('mean reverting based on p value < 0.05')
print ('\n........ section 5 - hurst..........................')
def hurst(ts): # Returns the Hurst Exponent of the time series vector ts
lags = range(2, 100) # Create the range of lag values
# Calculate the array of the variances of the lagged differences
tau = [sqrt(std(subtract(ts[lag:], ts[:-lag]))) for lag in lags]
poly = polyfit(log(lags), log(tau), 1) # Use a linear fit to estimate the Hurst Exponent
return poly[0]*2.0 # Return the Hurst exponent from the polyfit output
gbm = log(cumsum(randn(100000))+1000) # geometric brownian motion
mr = log(randn(100000)+1000) # mean reverting
tr = log(cumsum(randn(100000)+1)+1000) # trending
# Output the Hurst Exponent for each of the above series and the price of the ticker's close for the ADF test
print ("hurst - geometric brownian motion - random walk - should be around 0.50: %s" % hurst(gbm))
print ("hurst - mean reverting - should be around 0.00): %s" % hurst(mr))
print ("hurst - trending - should be around 1.00): %s" % hurst(tr))
print ("hurst: %s" % hurst(df['Close']))
print ('\n....... section 6 - half life ..........................')
def get_halflife(s):
s_lag = s.shift(1)
s_lag.ix[0] = s_lag.ix[1]
s_ret = s - s_lag
s_ret.ix[0] = s_ret.ix[1]
s_lag2 = sm.add_constant(s_lag)
model = sm.OLS(s_ret,s_lag2)
res = model.fit()
halflife = round(-np.log(2) / res.params[1],0)
print ('half life = ' + str(halflife))
return halflife
df.apply(get_halflife)
print ('\nprogram complete')
输出:
.................... section 1 - adf statistic ......................
close - adf statistic: -1.959732, p value: 0.304504
trending based on p value > 0.05
critical values:
5%: -2.880
if adf statistic > % test statistic, then trending at that %
10%: -2.577
if adf statistic > % test statistic, then trending at that %
1%: -3.472
if adf statistic > % test statistic, then trending at that %
.................. section 2 - different lag times ...................
p-value: 0.304504 trending based on lag = 1
p-value: 0.304504 trending based on lag = 2
p-value: 0.304504 trending based on lag = 3
p-value: 0.304504 trending based on lag = 4
p-value: 0.304504 trending based on lag = 5
p-value: 0.304504 trending based on lag = 6
p-value: 0.304504 trending based on lag = 7
p-value: 0.304504 trending based on lag = 8
p-value: 0.304504 trending based on lag = 9
p-value: 0.304504 trending based on lag = 10
p-value: 0.304504 trending based on lag = 20
p-value: 0.304504 trending based on lag = 30
p-value: 0.304504 trending based on lag = 40
p-value: 0.009252 mean reverting based on lag = 50
p-value: 0.304504 trending based on lag = 60
.......... section 3 - different regression types ....................
regression for close with c = constant only, p-value: 0.304504
trending based on p value > 0.05
regression for close with ct = constant and trend, p-value: 0.588526
trending based on p value > 0.05
regression for close with ctt = constant and linear and quadratic trend, p-
value: 0.817117
trending based on p value > 0.05
regression for close with nc = no constant, no trend, p-value: 0.739828
trending based on p value > 0.05
........................ section 4 - different lag types
lag = 1, autolag = AIC, p-value = 0.304504
trending based on p value > 0.05
lag = 1, autolag = BIC, p-value = 0.304504
trending based on p value > 0.05
lag = 1, autolag = t-stat, p-value = 0.304504
trending based on p value > 0.05
lag = 2, autolag = AIC, p-value = 0.304504
trending based on p value > 0.05
lag = 2, autolag = BIC, p-value = 0.304504
trending based on p value > 0.05
lag = 2, autolag = t-stat, p-value = 0.304504
trending based on p value > 0.05
lag = 3, autolag = AIC, p-value = 0.304504
trending based on p value > 0.05
lag = 3, autolag = BIC, p-value = 0.304504
trending based on p value > 0.05
lag = 3, autolag = t-stat, p-value = 0.304504
trending based on p value > 0.05
lag = 4, autolag = AIC, p-value = 0.304504
trending based on p value > 0.05
lag = 4, autolag = BIC, p-value = 0.304504
trending based on p value > 0.05
lag = 4, autolag = t-stat, p-value = 0.304504
trending based on p value > 0.05
lag = 5, autolag = AIC, p-value = 0.304504
trending based on p value > 0.05
lag = 5, autolag = BIC, p-value = 0.304504
trending based on p value > 0.05
lag = 5, autolag = t-stat, p-value = 0.304504
trending based on p value > 0.05
lag = 6, autolag = AIC, p-value = 0.304504
trending based on p value > 0.05
lag = 6, autolag = BIC, p-value = 0.304504
trending based on p value > 0.05
lag = 6, autolag = t-stat, p-value = 0.304504
trending based on p value > 0.05
lag = 7, autolag = AIC, p-value = 0.304504
trending based on p value > 0.05
lag = 7, autolag = BIC, p-value = 0.304504
trending based on p value > 0.05
lag = 7, autolag = t-stat, p-value = 0.304504
trending based on p value > 0.05
lag = 8, autolag = AIC, p-value = 0.304504
trending based on p value > 0.05
lag = 8, autolag = BIC, p-value = 0.304504
trending based on p value > 0.05
lag = 8, autolag = t-stat, p-value = 0.304504
trending based on p value > 0.05
lag = 9, autolag = AIC, p-value = 0.304504
trending based on p value > 0.05
lag = 9, autolag = BIC, p-value = 0.304504
trending based on p value > 0.05
lag = 9, autolag = t-stat, p-value = 0.304504
trending based on p value > 0.05
lag = 10, autolag = AIC, p-value = 0.304504
trending based on p value > 0.05
lag = 10, autolag = BIC, p-value = 0.304504
trending based on p value > 0.05
lag = 10, autolag = t-stat, p-value = 0.304504
trending based on p value > 0.05
lag = 20, autolag = AIC, p-value = 0.304504
trending based on p value > 0.05
lag = 20, autolag = BIC, p-value = 0.304504
trending based on p value > 0.05
lag = 20, autolag = t-stat, p-value = 0.046194
mean reverting based on p value < 0.05
lag = 30, autolag = AIC, p-value = 0.304504
trending based on p value > 0.05
lag = 30, autolag = BIC, p-value = 0.304504
trending based on p value > 0.05
lag = 30, autolag = t-stat, p-value = 0.009252
mean reverting based on p value < 0.05
lag = 40, autolag = AIC, p-value = 0.304504
trending based on p value > 0.05
lag = 40, autolag = BIC, p-value = 0.304504
trending based on p value > 0.05
lag = 40, autolag = t-stat, p-value = 0.009252
mean reverting based on p value < 0.05
lag = 50, autolag = AIC, p-value = 0.009252
mean reverting based on p value < 0.05
lag = 50, autolag = BIC, p-value = 0.304504
trending based on p value > 0.05
lag = 50, autolag = t-stat, p-value = 0.335375
trending based on p value > 0.05
lag = 60, autolag = AIC, p-value = 0.304504
trending based on p value > 0.05
lag = 60, autolag = BIC, p-value = 0.304504
trending based on p value > 0.05
lag = 60, autolag = t-stat, p-value = 0.335375
trending based on p value > 0.05
................ section 5 - hurst..........................
hurst - geometric brownian motion - random walk - should be around 0.50: 0.50270119846
hurst - mean reverting - should be around 0.00): 0.000107190651568
hurst - trending - should be around 1.00): 0.95085392982
hurst: 0.114851328923
.................. section 6 - half life ..........................
half life = 15.0
half life = 11.0
half life = 10.0
half life = 14.0
half life = 1.0
program complete