0

我对机器学习比较陌生。我目前正在研究一个不平衡的二元分类问题,我需要测试不同的模型以及它们在不同类型的采样技术下的表现。我已经选择了一个名为 的管道model,它由一个预处理器ColumnTransformer、一个TomeLink采样器和一个KNNClassiefier估计器组成。调用时model.get_params(),我得到以下输出:

{'classification': KNeighborsClassifier(leaf_size=120),
 'classification__algorithm': 'auto',
 'classification__leaf_size': 120,
 'classification__metric': 'minkowski',
 'classification__metric_params': None,
 'classification__n_jobs': None,
 'classification__n_neighbors': 5,
 'classification__p': 2,
 'classification__weights': 'uniform',
 'memory': None,
 'preprocessor': ColumnTransformer(remainder='passthrough',
                   transformers=[('knnimputer', KNNImputer(),
                                  ['policePrecinct']),
                                 ('onehotencoder-1', OneHotEncoder(),
                                  ['gender']),
                                 ('standardscaler', StandardScaler(),
                                  ['long', 'lat']),
                                 ('onehotencoder-2', OneHotEncoder(),
                                  ['neighborhood', 'problem'])]),
 'preprocessor__knnimputer': KNNImputer(),
 'preprocessor__knnimputer__add_indicator': False,
 'preprocessor__knnimputer__copy': True,
 'preprocessor__knnimputer__metric': 'nan_euclidean',
 'preprocessor__knnimputer__missing_values': nan,
 'preprocessor__knnimputer__n_neighbors': 5,
 'preprocessor__knnimputer__weights': 'uniform',
 'preprocessor__n_jobs': None,
 'preprocessor__onehotencoder-1': OneHotEncoder(),
 'preprocessor__onehotencoder-1__categories': 'auto',
 'preprocessor__onehotencoder-1__drop': None,
 'preprocessor__onehotencoder-1__dtype': numpy.float64,
 'preprocessor__onehotencoder-1__handle_unknown': 'error',
 'preprocessor__onehotencoder-1__sparse': True,
 'preprocessor__onehotencoder-2': OneHotEncoder(),
 'preprocessor__onehotencoder-2__categories': 'auto',
 'preprocessor__onehotencoder-2__drop': None,
 'preprocessor__onehotencoder-2__dtype': numpy.float64,
 'preprocessor__onehotencoder-2__handle_unknown': 'error',
 'preprocessor__onehotencoder-2__sparse': True,
 'preprocessor__remainder': 'passthrough',
 'preprocessor__sparse_threshold': 0.3,
 'preprocessor__standardscaler': StandardScaler(),
 'preprocessor__standardscaler__copy': True,
 'preprocessor__standardscaler__with_mean': True,
 'preprocessor__standardscaler__with_std': True,
 'preprocessor__transformer_weights': None,
 'preprocessor__transformers': [('knnimputer',
   KNNImputer(),
   ['policePrecinct']),
  ('onehotencoder-1', OneHotEncoder(), ['gender']),
  ('standardscaler', StandardScaler(), ['long', 'lat']),
  ('onehotencoder-2', OneHotEncoder(), ['neighborhood', 'problem'])],
 'preprocessor__verbose': False,
 'preprocessor__verbose_feature_names_out': True,
 'random_under_sampler': TomekLinks(),
 'random_under_sampler__n_jobs': None,
 'random_under_sampler__sampling_strategy': 'auto',
 'steps': [('preprocessor', ColumnTransformer(remainder='passthrough',
                     transformers=[('knnimputer', KNNImputer(),
                                    ['policePrecinct']),
                                   ('onehotencoder-1', OneHotEncoder(),
                                    ['gender']),
                                   ('standardscaler', StandardScaler(),
                                    ['long', 'lat']),
                                   ('onehotencoder-2', OneHotEncoder(),
                                    ['neighborhood', 'problem'])])),
  ('random_under_sampler', TomekLinks()),
  ('classification', KNeighborsClassifier(leaf_size=120))],
 'verbose': False}

现在,正如您想象的那样,我需要优化模型的超参数。然而,两者都GridSearch需要RandomizedSearch永远运行,结果几乎总是不令人满意。出于这些原因,我决定接受Optuna.

示例和教程非常好,但经常使用 Optuna 进行深度学习。此外,在我看来,代码编写过程在直觉上似乎是错误的(对于短管道来说,编写网格很简单,但是像我这样的管道呢?)

因此:

有没有办法让网格插入到Optuna def objective(train) 函数中?例子表示赞赏!

4

0 回答 0