python - 如何在 SMOTE 算法中使用字典对多类输入数据进行不同的重采样？

Question

我想使用库在 python 中使用 SMOTE 算法执行过采样imblearn.over_sampling。我的输入数据有四个目标类。我不想对所有少数类分布进行过度采样以匹配多数类分布。我想以不同的方式对我的每个少数族裔进行过采样。

当我使用时SMOTE(sampling_strategy = 1, k_neighbors=2,random_state = 1000)，出现以下错误。

ValueError: "sampling_strategy" can be a float only when the type of target is binary. For multi-class, use a dict.

然后，根据错误，我使用字典作为“sampling_strategy”，如下所示，

SMOTE(sampling_strategy={'1.0':70,'3.0':255,'2.0':50,'0.0':150},k_neighbors=2,random_state = 1000)

但是，它给出了以下错误，

ValueError: The {'2.0', '1.0', '0.0', '3.0'} target class is/are not present in the data.

有谁知道我们如何定义字典以使用 SMOTE 对数据进行不同的过采样？

score 0 · Accepted Answer

您必须为每个类指定所需的样本数并将此字典传递给 SMOTE 对象。

代码：

import numpy as np
from collections import Counter
from imblearn.over_sampling import SMOTE

x1 = np.random.randint(500, size =(200,13))
y1 = np.concatenate([np.array([0]*100), np.array([1]*65), np.array([2]*25), np.array([3]*10)])
np.random.shuffle(y1)
Counter(y1)

输出：

Counter({0: 100, 1: 65, 2: 25, 3: 10})

代码：

sm = SMOTE(sampling_strategy = {0: 100, 1: 70, 2: 90, 3: 40})
X_res, y_res = sm.fit_resample(x1, y1)
Counter(y_res)

输出：

Counter({0: 100, 1: 70, 2: 90, 3: 40})

有关详细信息，请参阅此处的文档。

您收到的错误是因为字典中指定的标签与实际标签不匹配。

python - 如何在 SMOTE 算法中使用字典对多类输入数据进行不同的重采样？

1 回答 1

Related

Reference