0

我试图为一组 logloss 值运行最小化函数,但是当使用 scipy.minimize 函数时,它似乎返回了一个次优值。

数据来自 pandas 表:

点击,prob1,prob2,prob3

0, 0.0023, 0.0024, 0.012

1, 0.89, 0.672, 0.78

0, 0.43, 0.023, 0.032

from scipy.optimize import minimize 
from math import log
import numpy as np
import pandas as pd

def logloss(p, y):
  p = max(min(p, 1 - 10e-15), 10e-15)
  return -log(p) if y == 1 else -log(1 - p)

def ensemble_weights(weights, probs, y_true):
  loss = 0
  final_pred = []
  prob_length = len(probs)

  for i in range(prob_length):
    w_sum = 0
    for index, weight in enumerate(weights):
      w_sum += probs[i][index] * weight

      final_pred.append(w_sum)

    for index, pred in enumerate(final_pred):
      loss += logloss(pred, y_true[index])
      print loss / prob_length, 'weights :=', weights
  return loss / prob_length


## w0 is the initial guess for the minimum of function 'fun'
## This initial guess is that all weights are equal
w0 = [1/probs.shape[1]] * probs.shape[1]

# ## This sets the bounds on the weights, between 0 and 1
bnds = [(0,1)] * probs.shape[1]
## This sets the constraints on the weights, they must sum to 1
## Or, in other words, 1 - sum(w) = 0
cons = ({'type':'eq','fun':lambda w: 1 - np.sum(w)})

weights = minimize(
    ensemble_weights,
    w0,
    (probs,y_true),
    method='SLSQP',
    bounds=bnds,
    constraints=cons
)
## As a sanity check, make sure the weights do in fact sum to 1
print("Weights sum to %0.4f:" % weights['fun'])
print weights['x']

为了帮助调试,我在函数中使用了 print 语句,它返回以下内容。

0.0101326509533 权重:= [ 1. 0. 0.]

0.0101326509533 权重:= [ 1. 0. 0.]

0.0101326509702 权重:= [ 1.00000001 0. 0. ]

0.0101292476389 权重:= [ 1.00000000e+00 1.49011612e-08 0.00000000e+00]

0.0101326509678 权重:= [ 1.00000000e+00 0.00000000e+00 1.49011612e-08]

0.0102904525781 权重:= [ -4.44628778e-10 1.00000000e+00 -4.38298620e-10]

0.00938612854966 权重:= [ 5.00000345e-01 4.99999655e-01 -2.19149158e-10]

0.00961930211064 权重:= [ 7.49998538e-01 2.50001462e-01 -1.09575296e-10]

0.00979499597866 权重:= [ 8.74998145e-01 1.25001855e-01 -5.47881403e-11]

0.00990978430231 权重:= [ 9.37498333e-01 6.25016666e-02 -2.73943942e-11]

0.00998305685424 权重:= [ 9.68748679e-01 3.12513212e-02 -1.36974109e-11]

0.0100300175342 权重:= [ 9.84374012e-01 1.56259881e-02 -6.84884901e-12]

0.0100605546439 权重:= [ 9.92186781e-01 7.81321874e-03 -3.42452299e-12]

0.0100807513117 权重:= [ 9.96093233e-01 3.90676721e-03 -1.71233067e-12]

0.0100942930446 权重:= [ 9.98046503e-01 1.95349723e-03 -8.56215139e-13]

0.0101034594634 权重:= [ 9.99023167e-01 9.76832595e-04 -4.28144378e-13]

0.0101034594634 权重:= [ 9.99023167e-01 9.76832595e-04 -4.28144378e-13]

0.0101034594804 权重:= [ 9.99023182e-01 9.76832595e-04 -4.28144378e-13]

0.0101034593149 权重:= [ 9.99023167e-01 9.76847497e-04 -4.28144378e-13]

0.010103459478 权重:= [ 9.99023167e-01 9.76832595e-04 1.49007330e-08]

权重总和为 0.0101:

[ 9.99023167e-01 9.76832595e-04 -4.28144378e-13]

我的期望是返回的最佳权重应该是: 0.00938612854966 weights := [ 5.00000345e-01 4.99999655e-01 -2.19149158e-10]

任何人都可以看到一个明显的问题吗?

仅供参考->此代码实际上是对 kaggle otto 脚本的破解 https://www.kaggle.com/hsperr/otto-group-product-classification-challenge/finding-ensamble-weights

4

1 回答 1

0

解决了

options = {'ftol':1e-9}

作为最小化功能的一部分

于 2015-06-26T06:37:33.213 回答