0

I mean to say, lets say I have 10 subsets (set1, set2,.....set10) of a training set. To perform 10 fold CV, according to me I should train my algorithm on rbind(set2,set3.....set9,set10) and test it on set1. Then I will train it on rbind( set1,set3,set4,....set10) and test it on set2 and so on. Am I correct ?

I have a feeling that we train algorithm on set2, set3....set10 one by one and test it on set1. This way we have 9 sets of predictions on set1 and then we can average it out. Which one is the correct way?

Any help would be greatly appreciated.

Thank you.

4

2 回答 2

0

您理解将一组留给我们进行测试,并将剩余的组合用于测试是正确的。

请参考问题和第二个答案@ 10折交叉验证

于 2013-06-04T07:37:43.883 回答
0

情况类似于此处描述的情况:

在此处输入图像描述

作为旁注,如果您注意您的班级的先验概率(待预测)在所有情况下大致相等,您会更好(set1, set2,.....set10)

这称为分层 k 折交叉验证,选择折痕以使平均响应值在所有折痕中大致相等。在二分分类的情况下,这意味着每个折叠包含大致相同比例的两种类别标签。

于 2013-08-21T14:37:38.343 回答