1

我正在尝试使用 GridSearchCV 设置 DecisionTreeClassifiers 的超参数,并且由于我的数据不平衡,我正在尝试使用 imblearn.over_sampling.RandomOverSampler。

from imblearn.over_sampling import RandomOverSampler

dtpass = tree.DecisionTreeClassifier()
pipe1 = Pipeline([('sampling', RandomOverSampler()), ('class', dtpass)])

parameters = {'class__max_depth': range(3,7), 
          'class__ccp_alpha': np.arange(0, 0.001, 0.00025), 
          'class__min_samples_leaf' : [50]
         }

dt2 = GridSearchCV(estimator = pipe1, 
               param_grid = parameters,
               n_jobs = 4,
              scoring = 'roc_auc'
)

dt2.fit(x, y)

这将返回一个错误:

AttributeError: 'RandomOverSampler' object has no attribute '_validate_data'

我在这里做错了什么?

编辑:下面发布的解决方案

4

2 回答 2

1

试试这个:

from imblearn.over_sampling import RandomOverSampler
from sklearn.tree import DecisionTreeClassifier
from imblearn.pipeline import make_pipeline
from sklearn.model_selection import GridSearchCV
import numpy as np

dtpass = DecisionTreeClassifier()
sampling=RandomOverSampler()


pipe1=make_pipeline(sampling,dtpass)
# pipe1 = Pipeline([('sampling', RandomOverSampler()), ('class', dtpass)])

parameters = {'class__max_depth': range(3,7), 
          'class__ccp_alpha': np.arange(0, 0.001, 0.00025), 
          'class__min_samples_leaf' : [50]
         }

dt2 = GridSearchCV(estimator = pipe1, 
               param_grid = parameters,
               n_jobs = 4,
              scoring = 'roc_auc'
)

dt2.fit(x, y)
于 2020-06-20T17:34:38.113 回答
0

链接到需要大量谷歌搜索的解决方案页面:

https://makerspace.aisingapore.org/community/ai4i-5-supervised-learning/encountered-attributeerror-when-run-train_test_splitpreprocessed_data-output_var-after-randomoversampler/

解决方案是

 pip install -U imbalanced-learn

代替

 conda install -c conda-forge imbalanced-learn
于 2020-06-20T18:46:48.083 回答