matlab - 留一个 - MATLAB

Question

我试图使用以下策略对数据集进行分类：

留一个交叉验证
KNN 对每个“折叠”进行分类（计算错误数）
计算最终误差
重复 k=[1,2,3,4,5,7,10,12,15,20]

这是fisheriris数据集的代码：

load fisheriris
cur=meas;true_label=species;

for norm=0:2
    feats=normalizamos(cur,norm); %this is just a function I use in my dataset
                                  for normalization. norm=0 equals no normalization
                                  norm=1 and norm=2 are two different normalizations

    c=cvpartition(size(feats,1),'leaveout');

    for k=[1,2,3,4,5,7,10,12,15,20]

        clear n_erros
        for i=1:c.NumTestSets
            tr=c.training(i);te=c.test(i);

            train_set=feats(tr,:);
            test_set=feats(te,:);

            train_class=true_label(tr);
            test_class=true_label(te);

            pred=knnclassify(test_set,train_set,train_class,k);
            n_erros(i)=sum(~strcmp(pred,test_class));
        end

        err_rate=sum(n_erros)/sum(c.TestSize)
    end
end

由于结果（对于我的数据集）显示出奇怪的不连贯值，我决定编写自己的 LOO 版本，如下所示：

for i=1:size(cur,1)     

    test_set=feats(i,:);
    test_class=true_label(i);

    if i==1
        train_set=feats(i+1:end,:);
        train_class=true_label(i+1:end);
    else
        train_set=[feats(1:i-1,:);feats(i+1:end,:)];
        train_class=[true_label(1:i-1);true_label(i+1:end)];
    end

    pred=knnclassify(test_set,train_set,train_class,k);
    n_erros(i)=sum(~strcmp(pred,test_class));
end

假设我的代码版本写得很好，我希望得到相同或至少相似的结果。以下是两种结果：

知道为什么结果如此不同吗？我应该使用什么版本？现在我正在考虑重写我所做的其他测试（对于 3 倍、5 倍等），只是为了确定。

谢谢你们

matlab - 留一个 - MATLAB

0 回答 0

Related

Reference