4

我有一个数组,其中包含作为两个不同数量(alpha 和 eigRange)函数的误差值。

我像这样填充我的数组:

   for j in range(n): 
        for i in range(alphaLen):
            alpha = alpha_list[i]
            c = train.eig(xt_, yt_,m-j, m,alpha, "cpu")
            costListTrain[j, i] = cost.err(xt_, xt_, yt_, c)

   normedValues=costListTrain/np.max(costListTrain.ravel())

在哪里

n = 20
alpha_list = [0.0001,0.0003,0.0008,0.001,0.003,0.006,0.01,0.03,0.05]

我的costListTrain数组包含一些差异很小的值,例如:

2.809458902485728 2.809458905776425 2.809458913576337 2.809459011062461 2.030326752376704 2.030329906064879 2.030337351188699 2.030428976282031 1.919840839066182 1.919846470077076 1.919859731440199 1.920021453630778 1.858436351617677 1.858444223016128 1.858462730482461 1.858687054377165 1.475871326997542 1.475901926855846 1.475973476249240 1.476822830933632 1.475775410801635 1.475806023102173 1.475877601316863 1.476727286424228 1.475774284270633 1.475804896751524 1.475876475382906 1.476726165223209 1.463578292548192 1.463611627166494 1.463689466240788 1.464609083309240 1.462859608038034 1.462893157900139 1.462971489632478 1.463896516033939 1.461912706143012 1.461954067956570 1.462047793798572 1.463079574605320 1.450581041157659 1。452770209885761 1.454835202839513 1.459676311335618 1.450581041157643 1.452770209885764 1.454835202839484 1.459676311335624 1.450581041157651 1.452770209885735 1.454835202839484 1.459676311335610 1.450581041157597 1.452770209885784 1.454835202839503 1.459676311335620 1.450581041157575 1.452770209885757 1.454835202839496 1.459676311335619 1.450581041157716 1.452770209885711 1.454835202839499 1.459676311335613 1.450581041157667 1.452770209885744 1.454835202839509 1.459676311335625 1.450581041157649 1.452770209885750 1.454835202839476 1.459676311335617 1.450581041157655 1.452770209885708 1.454835202839442 1.459676311335622 1.450581041157571 1.452770209885700 1.454835202839498 1.459676311335622450581041157643 1.452770209885764 1.454835202839484 1.459676311335624 1.450581041157651 1.452770209885735 1.454835202839484 1.459676311335610 1.450581041157597 1.452770209885784 1.454835202839503 1.459676311335620 1.450581041157575 1.452770209885757 1.454835202839496 1.459676311335619 1.450581041157716 1.452770209885711 1.454835202839499 1.459676311335613 1.450581041157667 1.452770209885744 1.454835202839509 1.459676311335625 1.450581041157649 1.452770209885750 1.454835202839476 1.459676311335617 1.450581041157655 1.452770209885708 1.454835202839442 1.459676311335622 1.450581041157571 1.452770209885700 1.454835202839498 1.459676311335622450581041157643 1.452770209885764 1.454835202839484 1.459676311335624 1.450581041157651 1.452770209885735 1.454835202839484 1.459676311335610 1.450581041157597 1.452770209885784 1.454835202839503 1.459676311335620 1.450581041157575 1.452770209885757 1.454835202839496 1.459676311335619 1.450581041157716 1.452770209885711 1.454835202839499 1.459676311335613 1.450581041157667 1.452770209885744 1.454835202839509 1.459676311335625 1.450581041157649 1.452770209885750 1.454835202839476 1.459676311335617 1.450581041157655 1.452770209885708 1.454835202839442 1.459676311335622 1.450581041157571 1.452770209885700 1.454835202839498 1.459676311335622454835202839484 1.459676311335610 1.450581041157597 1.452770209885784 1.454835202839503 1.459676311335620 1.450581041157575 1.452770209885757 1.454835202839496 1.459676311335619 1.450581041157716 1.452770209885711 1.454835202839499 1.459676311335613 1.450581041157667 1.452770209885744 1.454835202839509 1.459676311335625 1.450581041157649 1.452770209885750 1.454835202839476 1.459676311335617 1.450581041157655 1.452770209885708 1.454835202839442 1.459676311335622 1.450581041157571 1.452770209885700 1.454835202839498 1.459676311335622454835202839484 1.459676311335610 1.450581041157597 1.452770209885784 1.454835202839503 1.459676311335620 1.450581041157575 1.452770209885757 1.454835202839496 1.459676311335619 1.450581041157716 1.452770209885711 1.454835202839499 1.459676311335613 1.450581041157667 1.452770209885744 1.454835202839509 1.459676311335625 1.450581041157649 1.452770209885750 1.454835202839476 1.459676311335617 1.450581041157655 1.452770209885708 1.454835202839442 1.459676311335622 1.450581041157571 1.452770209885700 1.454835202839498 1.459676311335622452770209885711 1.454835202839499 1.459676311335613 1.450581041157667 1.452770209885744 1.454835202839509 1.459676311335625 1.450581041157649 1.452770209885750 1.454835202839476 1.459676311335617 1.450581041157655 1.452770209885708 1.454835202839442 1.459676311335622 1.450581041157571 1.452770209885700 1.454835202839498 1.459676311335622452770209885711 1.454835202839499 1.459676311335613 1.450581041157667 1.452770209885744 1.454835202839509 1.459676311335625 1.450581041157649 1.452770209885750 1.454835202839476 1.459676311335617 1.450581041157655 1.452770209885708 1.454835202839442 1.459676311335622 1.450581041157571 1.452770209885700 1.454835202839498 1.459676311335622

正如你可以在这里价值非常非常接近!

我试图以在 x、y 轴上有两个量并且误差值由点颜色表示的方式绘制这些数据。

这就是我绘制数据的方式:

    alpha_list = np.log(alpha_list)        
    eigenvalues, alphaa  = np.meshgrid(eigRange, alpha_list) 

    vMin = np.min(costListTrain)
    vMax = np.max(costListTrain)

    plt.scatter(x, y, s=70, c=normedValues, vmin=vMin, vmax=vMax, alpha=0.50)

但结果不正确。

  • 我试图通过将所有值除以 来标准化我的错误值max,但它没有用!

  • 我可以使它工作的唯一方法(这是不正确的)是以两种不同的方式标准化我的数据。一个是基于每一列(这意味着因子 1 是恒定的,因子 2 是变化的),另一个是基于行的(意味着因子 2 是恒定的,因子 1 是变化的)。但这并没有真正的意义,因为我需要一个图来显示两个量在误差值上的权衡。

更新

这就是我最后一段的意思。基于与特征值对应的每一行的最大值对值进行归一化:

maxsEigBasedTrain= np.amax(costListTrain.T,1)[:,np.newaxis]    
maxsEigBasedTest= np.amax(costListTest.T,1)[:,np.newaxis]

normEigCostTrain=costListTrain.T/maxsEigBasedTrain
normEigCostTest=costListTest.T/maxsEigBasedTest

基于与 alpha 对应的每一列的最大值对值进行归一化:

maxsAlphaBasedTrain= np.amax(costListTrain,1)[:,np.newaxis]
maxsAlphaBasedTest= np.amax(costListTest,1)[:,np.newaxis]

normAlphaCostTrain=costListTrain/maxsAlphaBasedTrain
normAlphaCostTest=costListTest/maxsAlphaBasedTest

情节1:

在此处输入图像描述

哪里没有。eigenvalue = 10和 alpha 变化(应对应于图 1 的第 10 列):

在此处输入图像描述

其中alpha = 0.0001和特征值发生变化(应对应于 plot1 的第一行)

在此处输入图像描述

但正如您所见,结果与图 1 不同!

更新:

只是为了澄清更多的东西,这就是我读取数据的方式:

from sklearn.datasets.samples_generator import make_regression

rng = np.random.RandomState(0)
diabetes = datasets.load_diabetes()

X_diabetes, y_diabetes = diabetes.data, diabetes.target
X_diabetes=np.c_[np.ones(len(X_diabetes)),X_diabetes]
ind = np.arange(X_diabetes.shape[0])
rng.shuffle(ind)
#===============================================================================
# Split Data 
#===============================================================================
import math
cross= math.ceil(0.7*len(X_diabetes))
ind_train = ind[:cross]
X_train, y_train = X_diabetes[ind_train], y_diabetes[ind_train]

ind_val=ind[cross:]
X_val,y_val=  X_diabetes[ind_val], y_diabetes[ind_val]

我也在这里上传了.csv文件

log.csv包含图 1 标准化之前的原始值

normalizedLog.csv对于情节 1

eigenConst.csv对于情节 2

alphaConst.csv对于情节 3

4

1 回答 1

0

我想我找到了答案。首先,我的代码中有一个问题。我期待“特征值的编号”对应于行,但在我的 for 循环中它们填充了列。正确的答案是这样的:

for i in range(alphaLen):
    for j in range(n): 
        alpha=alpha_list[i]
        c=train.eig(xt_, yt_,m-j,m,alpha,"cpu")
        costListTrain[i,j]=cost.err(xt_,xt_,yt_,c)
        costListTest[i,j]=cost.err(xt_,xv_,yv_,c)

在向朋友和同事提问后,我得到了这个答案:

我会假设在默认的 imshow 和您可能想要使用的其他绘图命令上,对您正在绘制的值执行相同大小的间隔。如果您可以将其设置为对数,则应该没问题。理想情况下,我猜,同样的“填充垃圾箱”会证明是最有效的。

对于绘图,我只是从错误中减去min value并添加一个小数字,最后获取日志。

 temp=costListTrain- costListTrain.min()
 temp+=0.00000001
 extent = [0, 20,alpha_list[0], alpha_list[-1]]

 plt.imshow(np.log(temp),interpolation="nearest",cmap=plt.get_cmap('spectral'), extent =  extent, origin="lower")
 plt.colorbar()

结果是:

在此处输入图像描述

于 2012-12-11T12:42:25.787 回答