2

我正在使用信用卡数据使用 SMOTE 进行过采样。我正在使用 geeksforgeeks.org 中编写的代码(链接)

运行以下代码后,它会声明如下内容:

print("Before OverSampling, counts of label '1': {}".format(sum(y_train == 1))) 
print("Before OverSampling, counts of label '0': {} \n".format(sum(y_train == 0))) 

# import SMOTE module from imblearn library 
# pip install imblearn (if you don't have imblearn in your system) 
from imblearn.over_sampling import SMOTE 
sm = SMOTE(random_state = 2) 
X_train_res, y_train_res = sm.fit_sample(X_train, y_train.ravel()) 

print('After OverSampling, the shape of train_X: {}'.format(X_train_res.shape)) 
print('After OverSampling, the shape of train_y: {} \n'.format(y_train_res.shape)) 

print("After OverSampling, counts of label '1': {}".format(sum(y_train_res == 1))) 
print("After OverSampling, counts of label '0': {}".format(sum(y_train_res == 0))) 

输出:

Before OverSampling, counts of label '1': 345
Before OverSampling, counts of label '0': 199019 

After OverSampling, the shape of train_X: (398038, 29)
After OverSampling, the shape of train_y: (398038,) 

After OverSampling, counts of label '1': 199019
After OverSampling, counts of label '0': 199019

因为我在这个领域是全新的。我不明白如何以 CSV 格式显示这些数据。如果有人在这个问题上帮助我,我将非常高兴。

或者,如果有任何参考资料,我可以使用 SMOTE 从数据集中制作合成数据并将更新的数据集保存在 CSV 文件中,请提及。

如下图所示:

在此处输入图像描述

提前致谢。

4

1 回答 1

1

从我从您的代码中可以看出,您X_train_res和其他人是 Python Numpy 数组。你可以这样做:

import numpy as np
import pandas as pd

y_train_res = y_train_res.reshape(-1, 1) # reshaping y_train to (398038,1)
data_res = np.concatenate((X_train_res, y_train_res), axis = 1)
data.savetxt('sample_smote.csv', data_res, delimiter=",")

无法运行并检查它,但如果您遇到任何问题,请告诉我。

注意:您将不得不做更多的事情来添加列标签。一旦您完成此操作并需要帮助,请告诉我。

于 2019-11-01T05:57:51.963 回答