python - Hyperopt：当我为 sklearn 加载保存的模型时，如何知道为最佳模型选择了哪些变量？

Question

我训练了一个 sklearn Gradient Boosting 分类器并使用 Hyperopt 进行了优化。Hyperopt 仅选择 20 个变量，共 769 个。但是，当我尝试为 sklearn 加载权重时，在盲测中，不清楚选择了哪些变量。这是代码：

from xgboost import XGBClassifier

from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,f1_score,recall_score

# multi:mlogloss // binary:logistic

def accuracy(params):
    clf = XGBClassifier(**params,learning_rate=0.7,objective='binary:logistic', 
                    booster='gbtree', n_jobs=64,eval_metric="error",eval_set=eval_set, verbose=True)
    clf.fit(X_train,y_train) #eval_set=eval_set, 
    return clf.score(X_test, y_test)

eval_set=eval_set = [(X_test, y_test)]

parameters = {
    'n_estimators': hp.choice('n_estimators', range(20,40)),
    'max_depth': hp.choice('max_depth', range(4,100)),
    'gamma': hp.choice('gamma', range(0,10)),
    "min_child_weight":hp.choice("min_child_weight",range(0,1)),
    "num_features":hp.choice("num_features",range(10,X_train.shape[1])),
    "max_delta_step":hp.choice("max_delta_step",range(0,10))}


best = 0
def f(params):
    global best
    acc = accuracy(params)
    if acc > best:
        best = acc
    print ('Improving:', best, params)
    return {'loss': -acc, 'status': STATUS_OK}

trials = Trials()

best = fmin(f, parameters, algo=tpe.suggest, max_evals=80, trials=trials)
print ('best:',best)

clf = XGBClassifier(gamma=best['gamma'],max_delta_step=best['max_delta_step'],max_depth=best['max_depth'],
                learning_rate=0.1, n_estimators=best['n_estimators'], objective='binary:logistic', min_child_weight=best['min_child_weight'],
                num_features=best['num_features'],
                booster='gbtree', n_jobs=64,eval_metric="error",eval_set=eval_set, verbose=True)
clf.fit(X_train,y_train)
clf.score(X_test, y_test)

import joblib
filename = '/home/rubens.../modelos/Argumenta_Multi.sav'


joblib.dump(clf, filename)


loaded_model = joblib.load(filename)
result = loaded_model.predict(X_new)

我如何知道 hyperopt 选择了哪些 20 个变量？我害怕使用保存超选择权重的卡方（选择 K 最佳 = 20），因为超选择可能没有使用卡方作为变量选择。

在result=loaded_model...我收到以下错误：

ValueError: X has 769 features, but DecisionTreeClassifier is expecting 20 features as input.

我也不知道 Hyperopt 是否遵循sklearn了之前保存 Hyperopt 最佳模型的特征重要性：

model.feature_importances_

python - Hyperopt：当我为 sklearn 加载保存的模型时，如何知道为最佳模型选择了哪些变量？

0 回答 0

Related

Reference