> In: data.dtypes
Out: Organization Name object
Money Raised Currency (in USD) float64
Announced Date datetime64[ns]
Total Funding Amount Currency (in USD) float64
Organization Description object
Organization Location object
Raised Series A int64
Primary Industry object
Sub_Ind object
Sub_Ind2 object
Sub_Ind3 object
Sub_Ind4 object
Sub_Ind5 object
Sub_Ind6 object
Sub_Ind7 object
Investor1 object
Investor2 object
Investor3 object
Investor4 object
Investor5 object
Investor6 object
Investor7 object
Investor8 object
Investor9 object
Investor10 object
Investor11 object
> In: x = data.drop(columns=['Raised Series A', 'Announced Date'])
> In: y = data['Raised Series A']
> In: from imblearn.over_sampling import SMOTENC
> In: smote_nc = SMOTENC(categorical_features=[0,1,3,4,5,7,8,9,10,11,12,13,14,15,16,17,
18,19,20,21,22,23,24], random_state=0)
> In: x_resampled, y_resampled = smote_nc.fit_resample(x, y)
---------------------------------------------------------------------------
Out: ValueError Traceback (most recent call last)
in
----> 1 x_resampled, y_resampled = smote_nc.fit_resample(x, y)
~/opt/anaconda3/envs/unit2/lib/python3.7/site-packages/imblearn/base.py in fit_resample(self, X, y)
81 )
82
---> 83 output = self._fit_resample(X, y)
84
85 y_ = (label_binarize(output[1], np.unique(y))
~/opt/anaconda3/envs/unit2/lib/python3.7/site-packages/imblearn/over_sampling/_smote.py in _fit_resample(self, X, y)
936 def _fit_resample(self, X, y):
937 self.n_features_ = X.shape[1]
--> 938 self._validate_estimator()
939
940 # compute the median of the standard deviation of the minority class
~/opt/anaconda3/envs/unit2/lib/python3.7/site-packages/imblearn/over_sampling/_smote.py in _validate_estimator(self)
921 raise ValueError(
922 "Some of the categorical indices are out of range. Indices"
--> 923 " should be between 0 and {}".format(self.n_features_)
924 )
925 self.categorical_features_ = categorical_features
ValueError: Some of the categorical indices are out of range. Indices should be between 0 and 24
我一直在尝试将列组合包含在 categorical_features 参数中,但它们都不起作用。我的数据名声中也没有空值。我使用 Smotenc 的原因是因为我的目标向量非常倾斜:99.7% 是,0.3% 不是。请帮忙。