-1

GridSearchCV(无论是 fromsklearn还是 from dask)似乎参数有些奇怪或错误,导致 MLPRegressor 忽略该参数。
我用一个最小的工作示例来展示这种行为。
假设数字初始化featuresvalues在我的情况下

print(features.shape)
print(values.shape)
(321278, 36)
(321278,)

并运行以下代码

from dask_ml.model_selection import GridSearchCV as daskGridSearchCV
from sklearn.model_selection import GridSearchCV as skGridSearchCV
from sklearn.neural_network import MLPRegressor
myparams = {'hidden_layer_sizes': [(2, ), (4, )]}
daskgridCV = daskGridSearchCV(estimator=MLPRegressor(), n_jobs=-1, param_grid=myparams)
daskbestfit = daskgridCV.fit(features, values)
skgridCV = skGridSearchCV(estimator=MLPRegressor(), n_jobs=-1, param_grid=myparams,cv=3)
skbestfit = skgridCV.fit(features, values)
display(daskbestfit)
display(skbestfit)

结果是

GridSearchCV(cache_cv=True, cv=None, error_score='raise',
             estimator=MLPRegressor(activation='relu', alpha=0.0001,
                                    batch_size='auto', beta_1=0.9, beta_2=0.999,
                                    early_stopping=False, epsilon=1e-08,
                                    hidden_layer_sizes=(100,),
                                    learning_rate='constant',
                                    learning_rate_init=0.001, max_iter=200,
                                    momentum=0.9, n_iter_no_change=10,
                                    nesterovs_momentum=True, power_t=0.5,
                                    random_state=None, shuffle=True,
                                    solver='adam', tol=0.0001,
                                    validation_fraction=0.1, verbose=False,
                                    warm_start=False),
             iid=True, n_jobs=-1,
             param_grid={'hidden_layer_sizes': [(2,), (4,)]}, refit=True,
             return_train_score=False, scheduler=None, scoring=None)
GridSearchCV(cv=3, error_score='raise-deprecating',
             estimator=MLPRegressor(activation='relu', alpha=0.0001,
                                    batch_size='auto', beta_1=0.9, beta_2=0.999,
                                    early_stopping=False, epsilon=1e-08,
                                    hidden_layer_sizes=(100,),
                                    learning_rate='constant',
                                    learning_rate_init=0.001, max_iter=200,
                                    momentum=0.9, n_iter_no_change=10,
                                    nesterovs_momentum=True, power_t=0.5,
                                    random_state=None, shuffle=True,
                                    solver='adam', tol=0.0001,
                                    validation_fraction=0.1, verbose=False,
                                    warm_start=False),
             iid='warn', n_jobs=-1,
             param_grid={'hidden_layer_sizes': [(2,), (4,)]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=0)

因此在这两种情况下,hidden_layer_sizes参数都具有(100,)不在网格中的值。我做错了什么,或者这里发生了什么?

python-版本 3.6.9
sklearn-版本 0.21.2
dask_ml-版本 1.0.0

4

2 回答 2

1

这是绝对正常的。 在初始化GridSearchCVestimator=MLPRegressor()时,使用其默认值创建一个MLPRegressor实例((100,)hidden_​​layer_sizes参数的默认值。)

通过将GridSearchCV与数据拟合,它将遍历myparams每个折叠的超参数的每个可能组合并选择最佳组合。您可以通过访问来检查交叉验证的结果skgridCV.cv_results_

于 2019-09-25T20:34:49.403 回答
0

答案是简单地在网格参数中添加另一个参数以及您想要的 hidden_​​layers。

谢谢

这是一个例子:

parameters = {
  'learning_rate': ['constant','adaptive'],
  'solver': ['lbfgs','adam'],
  'tol' : 10.0 ** -np.arange(1, 6),
  'verbose' : [True],
  'early_stopping': [True],
  'activation' : ['tanh','logistic'],
  'learning_rate_init': 10.0 ** -np.arange(1, 6),
  'max_iter': [5000],
  'alpha': (.0001, .0002, .0003, .0004, .00005, .00006), **'hidden_layer_sizes'**:(np.arange(1,2)),
  'random_state':np.arange(1, 4)
}
于 2019-12-02T11:23:55.417 回答