我训练了一个 sklearn Gradient Boosting 分类器并使用 Hyperopt 进行了优化。Hyperopt 仅选择 20 个变量,共 769 个。但是,当我尝试为 sklearn 加载权重时,在盲测中,不清楚选择了哪些变量。这是代码:
from xgboost import XGBClassifier
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,f1_score,recall_score
# multi:mlogloss // binary:logistic
def accuracy(params):
clf = XGBClassifier(**params,learning_rate=0.7,objective='binary:logistic',
booster='gbtree', n_jobs=64,eval_metric="error",eval_set=eval_set, verbose=True)
clf.fit(X_train,y_train) #eval_set=eval_set,
return clf.score(X_test, y_test)
eval_set=eval_set = [(X_test, y_test)]
parameters = {
'n_estimators': hp.choice('n_estimators', range(20,40)),
'max_depth': hp.choice('max_depth', range(4,100)),
'gamma': hp.choice('gamma', range(0,10)),
"min_child_weight":hp.choice("min_child_weight",range(0,1)),
"num_features":hp.choice("num_features",range(10,X_train.shape[1])),
"max_delta_step":hp.choice("max_delta_step",range(0,10))}
best = 0
def f(params):
global best
acc = accuracy(params)
if acc > best:
best = acc
print ('Improving:', best, params)
return {'loss': -acc, 'status': STATUS_OK}
trials = Trials()
best = fmin(f, parameters, algo=tpe.suggest, max_evals=80, trials=trials)
print ('best:',best)
clf = XGBClassifier(gamma=best['gamma'],max_delta_step=best['max_delta_step'],max_depth=best['max_depth'],
learning_rate=0.1, n_estimators=best['n_estimators'], objective='binary:logistic', min_child_weight=best['min_child_weight'],
num_features=best['num_features'],
booster='gbtree', n_jobs=64,eval_metric="error",eval_set=eval_set, verbose=True)
clf.fit(X_train,y_train)
clf.score(X_test, y_test)
import joblib
filename = '/home/rubens.../modelos/Argumenta_Multi.sav'
joblib.dump(clf, filename)
loaded_model = joblib.load(filename)
result = loaded_model.predict(X_new)
我如何知道 hyperopt 选择了哪些 20 个变量?我害怕使用保存超选择权重的卡方(选择 K 最佳 = 20),因为超选择可能没有使用卡方作为变量选择。
在result=loaded_model...
我收到以下错误:
ValueError: X has 769 features, but DecisionTreeClassifier is expecting 20 features as input.
我也不知道 Hyperopt 是否遵循sklearn
了之前保存 Hyperopt 最佳模型的特征重要性:
model.feature_importances_