1

我目前正在从事一个基于 ML 的项目,我的数据存在轻微的不平衡,需要过度采样技术。特征 (X_train) 维度是 (90664, 190),目标 (Y_binary_train_trans) 是 (90664, )。但是,代码运行并仍然输出相同的、不均等的目标分布。这是用于 RandomOverSampler 的代码,它也已尝试使用 smote;

counter= Counter(Y_binary_train_trans)
ros= RandomOverSampler(random_state=42)
X_train, Y_binary_train_trans = ros.fit_resample(X_train,Y_binary_train_trans)
counter = Counter(Y_binary_test_trans)
4

1 回答 1

0
counter= Counter(Y_binary_train_trans)
ros= RandomOverSampler(random_state=42)
X_train, Y_binary_train_trans = ros.fit_resample(X_train,Y_binary_train_trans)
counter = Counter(Y_binary_test_trans)

至于此代码,您的第二个计数器计算的是测试样本,而不是您实际更改的训练样本!

相反,它应该是:

counter= Counter(Y_binary_train_trans)
ros= RandomOverSampler(random_state=42)
X_train, Y_binary_train_trans = ros.fit_resample(X_train,Y_binary_train_trans)
counter = Counter(Y_binary_train_trans)
于 2020-04-15T09:32:49.973 回答