我正在尝试从 pandas datareader 库下载股票价格并根据我提供的代码计算(每日、每周、每月等)回报。
下载数据后,我对该数据的分布执行 kstest,并根据提供的 p 值评估它是否类似于双正态分布(两个正态分布之和)。
由于我只为这个分布执行一个 kstest,我想利用 Python 中的“最小化”库来最大化 p 值(最小化 -p 值),改变这两个分布的平均值、标准差和权重的值。
import warnings
import numpy as np
import pandas as pd
import scipy.stats as st
from scipy.optimize import minimize
import statsmodels as sm
import matplotlib
import matplotlib.pyplot as plt
from pandas_datareader import data
import time
import xlwt
import matplotlib.ticker as mtick
from sklearn import datasets
def Puxa_Preco(ticker,start_date,end_date,lag):
dados= data.get_data_yahoo(ticker, start_date, end_date )
from sklearn import datasets
data_set = np.log(dados['Close'])-np.log(dados['Close'] .shift(lag))
data_set = data_set.fillna(method='ffill')
data_set = data_set.dropna()
y = pd.DataFrame()
y=data_set
x = np.arange(len(y))
size = len(y)
print(y)
return y
def mixnormal_cdf(distribuicao, weight1, mean1, stdv1,weight2, mean2, stdv2):
"""
CDF of a mixture of two normal distributions.
"""
return (weight1*st.norm.cdf(distribuicao, mean1, stdv1) +
weight2*st.norm.cdf(distribuicao, mean2, stdv2))
def Objetivo(X,distribuicao):
peso_dist_1 = X[0]
mi1 = X[1]
sigma1 = X[2]
peso_dist_2 = 1-X[0]
mi2 = X[3]
sigma2 = X[4]
stat2, pvalue = st.kstest(distribuicao, cdf=mixnormal_cdf,
args=(peso_dist_1, mi1, sigma1,peso_dist_2, mi2, sigma2))
''' Kolmogorov-Smirnov Test, to test whether or not the data is from a given distribution. The
returned p-value indicates the probability that the data is from the given distribution,
i.e. a low p-value means the data are likely not from the tested distribution.
Note that, for this test, it is necessary to specify shape, location, and scale parameters,
to obtain meaningful results (c,loc,scale).
stat2: the test statistic, e.g. the max distance between the
cumulated distributions '''
return -pvalue
ticker = 'PETR4.SA'
start_date = '2010-01-02' #yyyy-mm-dd
end_date = '2015-01-02'
for lag in range(1,503):
distribuicao = Puxa_Preco(ticker,start_date,end_date,lag)
n = len(distribuicao)
ChuteInicial=[0.3,0.0010,0.0010,-0.0030,0.0830] #peso_dist_1, mi1, sigma1, mi2, sigma2
test = [0.2,0.0020,0.0110,0.8,-0.0020,0.0230]
Limites = ([0,1],[-50,+50],[0,+50],[0,1],[-50,+50],[0,+50]) #peso_dist_1, mi1, sigma1, peso_dist_2,mi2, sigma2
print("------------------------------------------------------------------------------------------------")
print("Validation Test:")
print(-Objetivo(test,distribuicao)) #the value should be around -0.90 to verify that the objective function it is ok
solution = minimize(fun=Objetivo,x0=ChuteInicial,args=distribuicao,method='SLSQP',bounds = Limites) #minimize - p-valor
print("------------------------------------------------------------------------------------------------")
print("solution:")
print(solution)
找到以下解决方案:
fun: -8.098252265651002e-53
jac: array([-2.13080032e-35, 0.00000000e+00, 0.00000000e+00, -1.93307671e-34, 7.91878934e-35])
message: 'Optimization terminated successfully.'
nfev: 8
nit: 1
njev: 1
status: 6
success: True
x: array([ 0.3 , 0.001, 0.001, -0.003, 0.083])
但我知道正确的答案应该类似于(测试): [0.2,0.0020,0.0110,0.8,-0.0020,0.0230] 产生 0.90 的 p 值
在我看来,它只运行了几个模拟,并且由于它没有改变它停止的 p 值。
有没有一种方法可以确保“最小化”只有在找到大于 0.9 的 p 值后才会停止?有人可以帮帮我吗?
我尝试使用考虑 Nelder Mead 的最小化方法,并且似乎更准确,但甚至不接近应该作为答案的 0.9 p 值,我不知道 Nelder Mead 方法是否考虑了我提供的限制。
#solution = minimize(fun=Objetivo,x0=(ChuteInicial),args=distribuicao,method='Nelder-Mead',bounds = Limites,options={'int':1000000})