python - 为什么 xgboost 交叉验证表现如此出色，而训练/预测表现如此糟糕？

Question

我正在使用 xgboost，并且正在尝试训练模型。这是我的一些代码：

def trainModel(training_data_filepath):

    training_data = loadDataFromFile(training_data_filepath)

    algorithm_parameters = {'max_depth': 2, 'eta': 1, 'silent': 1, 'objective': 'binary:logistic'}
    num_rounds = 1

    print xgb.cv(algorithm_parameters, training_data, num_rounds, nfold=2, metrics={'error'}, seed=0)
    return xgb.train(algorithm_parameters, training_data)

交叉验证打印出来：

test-error-mean  test-error-std  train-error-mean  train-error-std
       0.020742               0          0.019866         0.000292

这对我来说是百分之二的测试错误，这非常好。但是对于返回的训练模型，我还运行了我自己的测试，在从训练集中提取的保持集上：

def testModel(classifier, test_data_filepath):

    test_data = loadDataFromFile(test_data_filepath)
    predictions = classifier.predict(test_data)
    labels = test_data.get_label()

    test_error = sum([1 for i in range(len(predictions)) if int(predictions[i]>0.5) != labels[i]]) / float(len(predictions))
    print 'Classifier test error: ' + `test_error`

出来

Classifier test error: 0.2786214953271028

这是 27%，这要糟糕得多。为什么会这样？当训练集上的交叉验证表现如此出色时，在所有训练数据上训练的模型如何在保持集上失败？我不得不想象我的逻辑有问题，但我什么也没看到。那个或 CV 的 xgboost 实现做了一些我不明白的事情。

score 0 · Accepted Answer

原来我是个傻瓜。我正在创建训练并分别保存集合，因此它们对不同的令牌有不同的索引，这意味着它比随机机会做得更好是一个奇迹。我认为这让我感到困惑——即使特征索引完全不同，它的准确率也比 50% 好得多。

python - 为什么 xgboost 交叉验证表现如此出色，而训练/预测表现如此糟糕？

1 回答 1

Related

Reference