我在 10000x1000 的大型数据集(10000 个具有 1000 个特征的对象)上使用来自 scikit-learn 的 SVC。我已经在其他来源中看到 SVMLIB 不能扩展到超过 ~10000 个对象,我确实观察到了这一点:
training time for 10000 objects: 18.9s
training time for 12000 objects: 44.2s
training time for 14000 objects: 92.7s
你可以想象当我尝试 80000 时会发生什么。然而,我发现非常令人惊讶的是,SVM 的 predict() 比训练 fit() 花费的时间更多:
prediction time for 10000 objects (model was also trained on those objects): 49.0s
prediction time for 12000 objects (model was also trained on those objects): 91.5s
prediction time for 14000 objects (model was also trained on those objects): 141.84s
让预测在线性时间内运行是微不足道的(尽管在这里它可能接近线性),而且通常比训练快得多。那么这里发生了什么?