r - In k-fold-cross validation, do we train algorithm on (k-1) subsets one by one or on combined (k-1) subsets at once?

Question

I mean to say, lets say I have 10 subsets (set1, set2,.....set10) of a training set. To perform 10 fold CV, according to me I should train my algorithm on rbind(set2,set3.....set9,set10) and test it on set1. Then I will train it on rbind( set1,set3,set4,....set10) and test it on set2 and so on. Am I correct ?

I have a feeling that we train algorithm on set2, set3....set10 one by one and test it on set1. This way we have 9 sets of predictions on set1 and then we can average it out. Which one is the correct way?

Any help would be greatly appreciated.

Thank you.

score 0 · Accepted Answer

您理解将一组留给我们进行测试，并将剩余的组合用于测试是正确的。

请参考问题和第二个答案@ 10折交叉验证

score 0 · Accepted Answer

情况类似于此处描述的情况：

在此处输入图像描述

作为旁注，如果您注意您的班级的先验概率（待预测）在所有情况下大致相等，您会更好(set1, set2,.....set10)。

这称为分层 k 折交叉验证，选择折痕以使平均响应值在所有折痕中大致相等。在二分分类的情况下，这意味着每个折叠包含大致相同比例的两种类别标签。

r - In k-fold-cross validation, do we train algorithm on (k-1) subsets one by one or on combined (k-1) subsets at once?

2 回答 2

Related

Reference