我正在研究用于分类的严重不平衡的多类数据。我想使用class_weight
许多scikit-learn
模型中给出的。在管道内执行此操作的最佳和正确方法是什么。
正如我在文档中看到的,scale_pos_weight
仅用于二进制分类。
这个答案在这里得到了“Firas Omrane”的 15 个赞成票,给出了一些想法,所以我使用了
classes_weights = list(class_weight.compute_class_weight('balanced',
classes = np.unique(y_train),
y = y_train))
weights = np.ones(y_train.shape[0], dtype = 'float')
for i, val in enumerate(y_train):
weights[i] = classes_weights[val-1]
XGBClassifier().fit(x_train, y_train, sample_weight=weights)
它可以很好地与fit
管道一起使用,但在用作:
('clf',XGBClassifier(class_weight = 'balanced', n_jobs = -1,objective = 'multi:softprob', sample_weight = classes_weights, )) # last step of the pipeline
它给出的错误如下:
('clf',XGBClassifier(class_weight = 'balanced', n_jobs = -1,objective = 'multi:softprob', sample_weight = classes_weights, )) # last step of Pipeline
WARNING: /tmp/build/80754af9/xgboost-split_1619724447847/work/src/learner.cc:541:
Parameters: { class_weight, sample_weight } might not be used.
This may not be accurate due to some parameters are only used in language bindings but
passed down to XGBoost core. Or some parameters are not used but slip through this
verification. Please open an issue if you find above cases.