machine-learning - 训练 ML 模型的正确顺序是什么？

Question

我有一个包含不平衡的多类因变量的数据集。我想知道训练模型的正确顺序是什么：

1）标准化-过采样-traintestsplit

2）traintestsplit-标准化-过采样

3)traintestsplit-oversampling-standardizing

score 0 · Accepted Answer

欢迎登机。

关于您的问题，更好的方法可能是：

preprocessing -> train test split -> normalizing -> over/undersampling

这必须是您的首要任务，这包括从数据中删除错误并加入分散在公司各处所需的所有类型的数据。

这必须是下一步要做的事情，因为有两件事：

在采样之前对数据进行归一化是一种很好的做法，因为一些采样方法使用模型来生成新的数据点，并且接收归一化的数据会产生更好的采样。

最后，对您的数据进行采样，我建议您评估不同的采样方法和采样率，并比较结果。

1 回答 1