我正在尝试编写自己的逻辑回归,并比较最大化对数似然的不同方法。使用 Newton-CG 方法,我收到错误消息“ValueError: setting an array element with a sequence”。仔细阅读,如果试图最小化的函数返回非 skalar,似乎这个错误会上升,但这里不是这种情况。我需要下面给出的三种方法来给出相同的结果(大约),但是在我的真实数据上运行时,一种不收敛,另一种给出的 LL 比最初的猜测更差,第三种根本没有运行.
为什么我会收到 ValueError 消息,我该如何解决?
我的代码(使用虚拟数据,真实数据约为 100 次测量)如下:
import numpy as np
from numpy import linalg
import scipy
from scipy.optimize import minimize
def CalcLL(beta,xinlist,yinlist):
LL=0.0
ncol=len(beta)
pi=FindPi(xinlist,beta.reshape(ncol,1))
for i in range(len(yinlist)):
LL=LL+np.where(yinlist[i]==1,np.log(pi[i]),np.log(1-pi[i]))
return -LL
def Jacobian(beta,xinlist,yinlist):
ncol=len(beta)
nrow=np.shape(xinlist)[0]
pi=FindPi(xinlist,beta.reshape(ncol,1))
Jac=np.transpose(np.matrix(yinlist-pi))*np.matrix(xinlist)
return Jac
def Hessian(beta,xinlist,yinlist):
ncol=len(beta)
nrow=np.shape(xinlist)[0]
pi=FindPi(xinlist,beta.reshape(ncol,1))
W=FindW(pi)
Hes=np.matrix(np.transpose(xinlist))*(np.matrix(W)*np.matrix(xinlist))
return Hes
def FindPi(xinlist,beta):
rows=np.shape(xinlist)[0]# Number of rows in x_new
cols=np.shape(xinlist)[1]# Number of columns in x_new
expon=np.dot(xinlist,beta)
expon=np.array(expon).reshape(rows,1)
pi=np.exp(expon)/(1+np.exp(expon))
return pi
def FindW(pi):
W=np.zeros(len(pi)*len(pi)).reshape(len(pi),len(pi))
for i in range(len(pi)):
W[i,i]=float(pi[i]*(1-pi[i]))
return W
xinlist=np.matrix([[1,1],[0,1],[1,1],[1,1],[1,1],[0,1],[0,1],[1,1],[1,1],[0,1]])
yinlist=np.transpose(np.matrix([0,0,0,0,0,1,1,1,1,1]))
ncol=np.shape(xinlist)[1]
beta1=np.zeros(ncol).reshape(ncol,1) # Initial guess for parameter values
limit=0.000001 # selfwritten Newton-Raphson method
iter_i=limit+1
while iter_i>limit:
Hes=Hessian(beta1,xinlist,yinlist)
Jac=np.transpose(Jacobian(beta1,xinlist,yinlist))
root_diff=np.array(linalg.inv(Hes)*Jac)
beta1=beta1+root_diff
iter_i=np.sum(root_diff*root_diff)
print "When running self-written algorithm, the log-likelihood is",-CalcLL(beta1,xinlist,yinlist)
beta2=np.zeros(ncol).reshape(ncol,1)
res=minimize(CalcLL,beta2,args=(xinlist,yinlist),method='Nelder-Mead',options={'xtol':1e-8,'disp':True,'maxiter':10000})
beta2=res.x
print "The log-likelihood using Nelder-Mead is", -CalcLL(beta2,xinlist,yinlist)
beta3=np.zeros(ncol).reshape(ncol,1)
res=minimize(CalcLL,beta3,args=(xinlist,yinlist),method='Newton-CG',jac=Jacobian,hess=Hes,options={'xtol':1e-8,'disp':True})
beta3=res.x
print "The log-likelihood using Newton-CG is", -CalcLL(beta3,xinlist,yinlist)
编辑:错误堆栈如下: Traceback(最近一次调用):
文件“MyLogisticRegression2.py”,第 62 行,在 res=minimize(CalcLL,beta3,args=(xinlist,yinlist),method='Newton-CG',jac=Jacobian,hess=Hes,options={'xtol': 1e-8,'disp':真})
文件 C:\Python27\lib\site-packages\scipy\optimize_minimize.py,第 447 行,在最小化**选项中)
文件 C:\Python27\lib\site-packages\scipy\optimize\optimize.py,第 2393 行,在 _minimize_newtoncg eta=numpy.min([0.5, numpy.sqrt(maggrad)])
文件 C:\Python27\lib\site-packages\numpy\core\fromnumeric.py,第 2393 行,in amin out=out,**kwargs)
文件 C:\Python27\lib\site-packages\numpy\core_methods.py,第 29 行,在 _amin 中返回 umr_minimum(a,axis,None,out,keepdims)
ValueError:使用序列设置数组元素