0

我正在使用 SMOTE,因为我的数据集不平衡,但我收到了如下所述的错误消息。我在这个论坛上看到过一篇关于同一主题的帖子。但是,在那篇文章中,建议这是由于重复的列名导致发生此错误。我检查了我的数据集,没有重复的列名,但我仍然收到此错误。我的数据集具有分类变量,并且都已转换为 1 和 0。

sm = SMOTE(random_state = 2) 
X_train_res, y_train_res = sm.fit_resample(X_train, y_train)

这是错误消息:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-147-9ac314f8e551> in <module>
  1 sm = SMOTE(random_state = 2)
 ----> 2 X_train_res, y_train_res = sm.fit_resample(X_train, y_train)

 ~\Anaconda3\lib\site-packages\imblearn\base.py in fit_resample(self, X, y)
    86               if binarize_y else output[1])
    87 
---> 88         X_, y_ = arrays_transformer.transform(output[0], y_)
    89         return (X_, y_) if len(output) == 2 else (X_, y_, output[2])
    90 

~\Anaconda3\lib\site-packages\imblearn\utils\_validation.py in transform(self, X, y)
     38 
     39     def transform(self, X, y):
---> 40         X = self._transfrom_one(X, self.x_props)
     41         y = self._transfrom_one(y, self.y_props)
     42         return X, y

~\Anaconda3\lib\site-packages\imblearn\utils\_validation.py in
_transfrom_one(self, array, props)
     57             import pandas as pd
     58             ret = pd.DataFrame(array, columns=props["columns"])
---> 59             ret = ret.astype(props["dtypes"])
     60         elif type_ == "series":
     61             import pandas as pd

~\Anaconda3\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors)    5681                 if col_name in dtype:    5682                     results.append(
-> 5683                         col.astype(dtype=dtype[col_name], copy=copy, errors=errors)    5684                     )    5685        else:

~\Anaconda3\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors)    5664             if self.ndim == 1:  # i.e. Series    5665                 if len(dtype) > 1 or self.name not in dtype:
-> 5666                     raise KeyError(    5667                         "Only the Series name can be used for "    5668                        "the key in Series dtype mappings."

KeyError: 'Only the Series name can be used for the key in Series dtype mappings.'
4

1 回答 1

0

我用以下方法解决了这个错误——你可能会也可能不会使用 ravel()。

sm = SMOTE(random_state = 2) X_train_res, y_train_res = sm.fit_resample(X_train.values, y_train.ravel())

于 2021-02-15T05:49:51.263 回答