我成功地在管道中使用 SMOTEENN 和 RF 实现了调整模型。像这样:
import random
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import roc_curve, roc_auc_score, confusion_matrix
from imblearn.over_sampling import SMOTE
from imblearn.combine import SMOTEENN
from imblearn.pipeline import Pipeline
在加载数据并获得X_train
、X_test
、y_train
和y_test
矩阵后,我成功地执行了 sklearn RandomizedSearch,如下所示:
seed = 1706
knn = 10
smoted = SMOTE(sampling_strategy = 'auto',
k_neighbors = knn,
random_state = seed)
mydata = pd.read_csv(datapath)
params_rf = {
'rf__max_depth':[8, 14, 20, 26],
'rf__min_samples_leaf':[8, 15, 22, 29],
'rf__max_features':[6, 12, 18, 24, 30],
'rf__n_estimators':[400, 800]
}
smote_enn = SMOTEENN(smote = smoted)
rf = RandomForestClassifier(criterion = 'gini')
pipeline = Pipeline([('smote_enn', smote_enn), ('rf', rf)]) #<-pipeline with smote and model steps
random.seed(1706)
grid_rf = RandomizedSearchCV(estimator = pipeline,
param_distributions = params_rf,
scoring = 'roc_auc',
cv = 8,
n_jobs = cpu_count()-2,
refit = True,
return_train_score = False,
n_iter = 80)
grid_rf.fit(X_train, y_train.values.ravel())
我的问题是:谁能帮我弄清楚为什么我不能用 Dask 的 RandomizedSearchCV 做同样的事情?这是我得到的代码和错误:
from dask_ml.model_selection import RandomizedSearchCV as DaskRandomGridSearchCV
grid_rf = DaskRandomGridSearchCV(estimator = pipeline,
param_distributions = params_rf,
scoring = 'roc_auc',
cv = 8,
###n_jobs = cpu_count()-2, <-not needed b/c of dask
refit = True,
return_train_score = False,
n_iter = 80)
grid_rf.fit(X_train, y_train.values.ravel())
AttributeError: 'SMOTEENN' object has no attribute 'transform'
为什么它适用于 sklearn 的 RandomizedSearchCV 而不是 dask 的RandomizedSearchCV?
我将所有库都升级到了最新版本。我正在使用 Python 3.6.9(我也在另一台运行 Python 3.7.3 的机器上尝试过,并得到了同样的错误)。