image-processing - 使用 SMOTE 对图像数据进行过采样

Question

我正在使用 CNN 进行二进制分类，并且数据不平衡，其中阳性医学图像：阴性医学图像 = 0.4：0.6。所以我想在训练前使用 SMOTE 对阳性医学图像数据进行过采样。但是，数据的维度是 4D (761,64,64,3) 这会导致错误

Found array with dim 4. Estimator expected <= 2

所以，我重塑了我的 train_data：

X_res, y_res = smote.fit_sample(X_train.reshape(X_train.shape[0], -1), y_train.ravel())

它工作正常。在将其提供给 CNN 之前，我通过以下方式对其进行重塑：

X_res = X_res.reshape(X_res.shape[0], 64, 64, 3)

现在，我不确定它是否是过采样的正确方法，并且 reshape 操作符会改变图像的结构吗？

score 6 · Accepted Answer

我有一个类似的问题。我使用了 reshape 函数来重塑图像（基本上是扁平化的图像）

X_train.shape
(8000, 250, 250, 3)

ReX_train = X_train.reshape(8000, 250 * 250 * 3)
ReX_train.shape
(8000, 187500)

smt = SMOTE()
Xs_train, ys_train = smt.fit_sample(ReX_train, y_train)

虽然，这种方法慢得可怜，但有助于提高性能。

score 1 · Accepted Answer

一旦您将图像展平，您就会丢失本地化信息，这就是卷积用于基于图像的机器学习的原因之一。
8000x250x250x3 具有内在含义 - 8000 个图像样本，每个图像的宽度为 250，高度为 250，并且当您进行 8000x250*250*3 重塑时，它们都有 3 个通道，除非您使用某种序列网络教它不好。
过采样对图像数据不利，您可以进行图像增强（20crop，引入高斯模糊、旋转、平移等噪声）

score 1 · Accepted Answer

首先展平图像
将 SMOTE 应用于此展平的图像数据及其标签
将扁平图像重塑为 RGB 图像

from imblearn.over_sampling import SMOTE
    
sm = SMOTE(random_state=42)
    
train_rows=len(X_train)
X_train = X_train.reshape(train_rows,-1)
(80,30000)

X_train, y_train = sm.fit_resample(X_train, y_train)
X_train = X_train.reshape(-1,100,100,3)
(>80,100,100,3)

image-processing - 使用 SMOTE 对图像数据进行过采样

3 回答 3

Related

Reference