0

目前我正在尝试使用 SMOTE 进行过采样,然后在管道中运行我的 XGBClassifier。出于某种原因,我无法让 HyperOpt 与 Pipeline 配合使用。

以下两个示例均运行正常:

smote = SMOTE(random_state = 42)
model = XGBClassifier(random_state = 42)
pipe = Pipeline([('smote', smote),
('model',model)])

cv = StratifiedKFold(n_splits = 5)

score = cross_val_score(pipe, X_train, y_train, cv=cv, scoring='roc_auc', n_jobs=-1).mean()

print(score)
model = XGBClassifier(random_state = 42)

def objective_pipe(params):
  model.set_params(**params)

  cv = StratifiedKFold(n_splits = 5)

  score = cross_val_score(model, X_train, y_train, cv=cv, scoring='roc_auc', n_jobs=-1).mean()

  return {'loss': -score, 'params':params, 'status':STATUS_OK}

trials = Trials()
best = fmin(fn=objective_pipe, space = params, algo=tpe.suggest, max_evals = 10, trials = trials, rstate=np.random.RandomState(42))

然而,当我将 Pipeline 放入目标函数中时,我最终得到了分数的 NaN 值。

smote = SMOTE(random_state = 42)
model = XGBClassifier(random_state = 42)
pipe = Pipeline([('smote', smote),
('model',model)])

def objective_pipe(params):
  pipe.set_params(**params)

  cv = StratifiedKFold(n_splits = 5)

  score = cross_val_score(pipe, X_train, y_train, cv=cv, scoring='roc_auc', n_jobs=-1).mean()

  return {'loss': -score, 'params':params, 'status':STATUS_OK}

trials = Trials()
best = fmin(fn=objective_pipe, space = params, algo=tpe.suggest, max_evals = 10, trials = trials, rstate=np.random.RandomState(42))

也许我只是错过了一些非常简单的东西,但不确定如何解决这个问题。欢迎任何建议/帮助/资源。

4

1 回答 1

0

我不完全确定为什么,但我有一个类似的问题,它通过设置 njobs = 1 消失了。我认为这与 SMOTE 无法以并行方式运行有关。

于 2020-12-29T10:31:37.320 回答