matlab - LibSVM 的准确度下降

Question

得到我的 testlabel 和 trainlabel 后，我在 libsvm 上实现了 SVM，我得到了 97.4359% 的准确率。( c = 1 和 g = 0.00375)

model = svmtrain(TrainLabel, TrainVec, '-c 1 -g 0.00375');
[predict_label, accuracy, dec_values] = svmpredict(TestLabel, TestVec, model);

在我找到最好的c和g之后，

bestcv = 0;
for log2c = -1:3,
  for log2g = -4:1,
    cmd = ['-v 5 -c ', num2str(2^log2c), ' -g ', num2str(2^log2g)];
    cv = svmtrain(TrainLabel,TrainVec, cmd);
    if (cv >= bestcv),
      bestcv = cv; bestc = 2^log2c; bestg = 2^log2g;
    end
    fprintf('%g %g %g (best c=%g, g=%g, rate=%g)\n', log2c, log2g, cv, bestc, bestg, bestcv);
  end
end

c = 8 和 g = 0.125

我再次实现模型：

 model = svmtrain(TrainLabel, TrainVec, '-c 8 -g 0.125');
[predict_label, accuracy, dec_values] = svmpredict(TestLabel, TestVec, model);

我的准确率为 82.0513%

准确率怎么可能降低？不应该增加吗？还是我犯了任何错误？

score 4 · Accepted Answer

您在参数调整期间获得的准确度是向上的，因为您预测的数据与您正在训练的数据相同。这对于参数调整通常很好。

但是，如果您希望这些准确度能够准确估计最终测试集上的真实泛化误差，那么您必须添加额外的交叉验证或其他重采样方案。

这是一篇非常清晰的论文，概述了一般问题（但在特征选择的类似背景下）：http ://www.pnas.org/content/99/10/6562.abstract

编辑：

我通常会添加交叉验证，例如：

n     = 95 % total number of observations
nfold = 10 % desired number of folds

% Set up CV folds
inds = repmat(1:nfold, 1, mod(nfold, n))
inds = inds(randperm(n))

% Loop over folds
for i = 1:nfold
  datapart = data(inds ~= i, :)

  % do some stuff

  % save results
end

% combine results

score 1 · Accepted Answer

要进行交叉验证，您应该拆分训练数据。在这里，您测试训练数据以找到您的最佳参数集。这不是一个好的衡量标准。您应该使用以下伪代码：

for param = set of parameter to test
  [trainTrain,trainVal] = randomly split (trainSet); %%% you can repeat that several times and take the mean accuracy
  model = svmtrain(trainTrain, param);
  acc = svmpredict(trainVal, model);
  if accuracy is the best
     bestPAram = param
  end
end

matlab - LibSVM 的准确度下降

2 回答 2

Related

Reference