我对机器学习比较陌生。我目前正在研究一个不平衡的二元分类问题,我需要测试不同的模型以及它们在不同类型的采样技术下的表现。我已经选择了一个名为 的管道model
,它由一个预处理器ColumnTransformer
、一个TomeLink
采样器和一个KNNClassiefier
估计器组成。调用时model.get_params()
,我得到以下输出:
{'classification': KNeighborsClassifier(leaf_size=120),
'classification__algorithm': 'auto',
'classification__leaf_size': 120,
'classification__metric': 'minkowski',
'classification__metric_params': None,
'classification__n_jobs': None,
'classification__n_neighbors': 5,
'classification__p': 2,
'classification__weights': 'uniform',
'memory': None,
'preprocessor': ColumnTransformer(remainder='passthrough',
transformers=[('knnimputer', KNNImputer(),
['policePrecinct']),
('onehotencoder-1', OneHotEncoder(),
['gender']),
('standardscaler', StandardScaler(),
['long', 'lat']),
('onehotencoder-2', OneHotEncoder(),
['neighborhood', 'problem'])]),
'preprocessor__knnimputer': KNNImputer(),
'preprocessor__knnimputer__add_indicator': False,
'preprocessor__knnimputer__copy': True,
'preprocessor__knnimputer__metric': 'nan_euclidean',
'preprocessor__knnimputer__missing_values': nan,
'preprocessor__knnimputer__n_neighbors': 5,
'preprocessor__knnimputer__weights': 'uniform',
'preprocessor__n_jobs': None,
'preprocessor__onehotencoder-1': OneHotEncoder(),
'preprocessor__onehotencoder-1__categories': 'auto',
'preprocessor__onehotencoder-1__drop': None,
'preprocessor__onehotencoder-1__dtype': numpy.float64,
'preprocessor__onehotencoder-1__handle_unknown': 'error',
'preprocessor__onehotencoder-1__sparse': True,
'preprocessor__onehotencoder-2': OneHotEncoder(),
'preprocessor__onehotencoder-2__categories': 'auto',
'preprocessor__onehotencoder-2__drop': None,
'preprocessor__onehotencoder-2__dtype': numpy.float64,
'preprocessor__onehotencoder-2__handle_unknown': 'error',
'preprocessor__onehotencoder-2__sparse': True,
'preprocessor__remainder': 'passthrough',
'preprocessor__sparse_threshold': 0.3,
'preprocessor__standardscaler': StandardScaler(),
'preprocessor__standardscaler__copy': True,
'preprocessor__standardscaler__with_mean': True,
'preprocessor__standardscaler__with_std': True,
'preprocessor__transformer_weights': None,
'preprocessor__transformers': [('knnimputer',
KNNImputer(),
['policePrecinct']),
('onehotencoder-1', OneHotEncoder(), ['gender']),
('standardscaler', StandardScaler(), ['long', 'lat']),
('onehotencoder-2', OneHotEncoder(), ['neighborhood', 'problem'])],
'preprocessor__verbose': False,
'preprocessor__verbose_feature_names_out': True,
'random_under_sampler': TomekLinks(),
'random_under_sampler__n_jobs': None,
'random_under_sampler__sampling_strategy': 'auto',
'steps': [('preprocessor', ColumnTransformer(remainder='passthrough',
transformers=[('knnimputer', KNNImputer(),
['policePrecinct']),
('onehotencoder-1', OneHotEncoder(),
['gender']),
('standardscaler', StandardScaler(),
['long', 'lat']),
('onehotencoder-2', OneHotEncoder(),
['neighborhood', 'problem'])])),
('random_under_sampler', TomekLinks()),
('classification', KNeighborsClassifier(leaf_size=120))],
'verbose': False}
现在,正如您想象的那样,我需要优化模型的超参数。然而,两者都GridSearch
需要RandomizedSearch
永远运行,结果几乎总是不令人满意。出于这些原因,我决定接受Optuna
.
示例和教程非常好,但经常使用 Optuna 进行深度学习。此外,在我看来,代码编写过程在直觉上似乎是错误的(对于短管道来说,编写网格很简单,但是像我这样的管道呢?)
因此:
有没有办法让网格插入到Optuna
def objective(train
) 函数中?例子表示赞赏!