0

我正在尝试训练 GradientBoosting 分类器。由于我的数据不平衡,我正在考虑 SMOTE 来平衡它。我尝试如下:

from sklearn.ensemble import GradientBoostingRegressor
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.datasets import load_boston
from sklearn.metrics import mean_absolute_error

# Import train_test_split function
from sklearn.model_selection import train_test_split

# Split dataset into training set and test set

from imblearn.over_sampling import SMOTE

y=df['Label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, stratify=y)
sm = SMOTE(random_state = 42)
X_train_oversampled, y_train_oversampled = sm.fit_sample(X_train, y_train)
X_train = pd.DataFrame(X_train_oversampled, columns=X_train.columns)

但我有这个错误:

---> 20 X_train = pd.DataFrame(X_train_oversampled, columns=X_train.columns)

/anaconda3/lib/python3.7/site-packages/scipy/sparse/base.py in __getattr__(self, attr)
    689             return self.getnnz()
    690         else:
--> 691             raise AttributeError(attr + " not found")
    692 
    693     def transpose(self, axes=None, copy=False):

AttributeError: columns not found

我不知道应该替换什么以及如何将 SMOTE 与 X_train 和 y_train 一起使用。你能请我如何按正确的顺序使用它吗?

4

1 回答 1

-1

可以肯定的是,您没有提供足够的代码或数据,也没有提供完整的回溯……但是最后一行中发生的错误表明 SMOTE 工作正常,错误是因为X_train是一个稀疏数组,它确实没有列名,因此没有属性columns。看起来你在某个时候有列名,所以你应该能够从df.

于 2020-10-12T02:25:56.210 回答