python - Any python Support Vector Machine library around that allows online learning?

Question

I do know there are some libraries that allow to use Support vector Machines from python code, but I am looking specifically for libraries that allow one to teach it online (this is, without having to give it all the data at once).

Are there any?

score 8 · Accepted Answer

LibSVM包含一个通过 SWIG 工作的 python 包装器。

来自其发行版的示例 svm-test.py：

#!/usr/bin/env python

from svm import *

# a three-class problem
labels = [0, 1, 1, 2]
samples = [[0, 0], [0, 1], [1, 0], [1, 1]]
problem = svm_problem(labels, samples);
size = len(samples)

kernels = [LINEAR, POLY, RBF]
kname = ['linear','polynomial','rbf']

param = svm_parameter(C = 10,nr_weight = 2,weight_label = [1,0],weight = [10,1])
for k in kernels:
    param.kernel_type = k;
    model = svm_model(problem,param)
    errors = 0
    for i in range(size):
        prediction = model.predict(samples[i])
        probability = model.predict_probability
        if (labels[i] != prediction):
            errors = errors + 1
    print "##########################################"
    print " kernel %s: error rate = %d / %d" % (kname[param.kernel_type], errors, size)
    print "##########################################"

param = svm_parameter(kernel_type = RBF, C=10)
model = svm_model(problem, param)
print "##########################################"
print " Decision values of predicting %s" % (samples[0])
print "##########################################"

print "Numer of Classes:", model.get_nr_class()
d = model.predict_values(samples[0])
for i in model.get_labels():
    for j in model.get_labels():
        if j>i:
            print "{%d, %d} = %9.5f" % (i, j, d[i,j])

param = svm_parameter(kernel_type = RBF, C=10, probability = 1)
model = svm_model(problem, param)
pred_label, pred_probability = model.predict_probability(samples[1])
print "##########################################"
print " Probability estimate of predicting %s" % (samples[1])
print "##########################################"
print "predicted class: %d" % (pred_label)
for i in model.get_labels():
    print "prob(label=%d) = %f" % (i, pred_probability[i])

print "##########################################"
print " Precomputed kernels"
print "##########################################"
samples = [[1, 0, 0, 0, 0], [2, 0, 1, 0, 1], [3, 0, 0, 1, 1], [4, 0, 1, 1, 2]]
problem = svm_problem(labels, samples);
param = svm_parameter(kernel_type=PRECOMPUTED,C = 10,nr_weight = 2,weight_label = [1,0],weight = [10,1])
model = svm_model(problem, param)
pred_label = model.predict(samples[0])

score 4 · Accepted Answer

一个都没听说过但是你真的需要在线学习吗？我使用 SVM 已经有一段时间了，从来没有遇到过必须使用在线学习的问题。通常我会为训练示例的更改次数设置一个阈值（可能是 100 或 1000），然后对所有示例进行批量重新训练。

如果您的问题是大规模的，您绝对必须使用在线学习，那么您可能想看看vowpal wabbit。

在评论后重新编辑如下：

Olivier Grisel建议在 LaSVM周围使用 ctypes 包装器。因为我之前不知道 LaSVM，而且它看起来很酷，所以我很想在我自己的问题上尝试一下 :)。

如果您仅限于使用 Python-VM（嵌入式设备、机器人），我建议使用投票/平均感知器，它的性能接近 SVM，但易于实现且默认“在线”。

刚刚看到Elefant有一些在线 SVM 代码。

score 1 · Accepted Answer

虽然那里没有 python 绑定，但 http://leon.bottou.org/projects/sgd中描述的算法是以在线方式训练的，并且可以使用例如 numpy 轻松重新实现。

score 1 · Accepted Answer

Pegasos是一种在线 SVM 算法，性能非常好。即使没有特定的 Python 绑定，它也很容易实现。作者的网站上有一个C 实现，它也是可适应的或可嵌入的。

score 0 · Accepted Answer

为什么要在线培训它？添加训练实例通常需要重新解决与 SVM 相关的二次规划问题。

解决这个问题的一种方法是在批处理模式下训练 SVM，当有新数据可用时，检查这些数据点是否在超平面的 [-1, +1] 边距内。如果是这样，请使用所有旧的支持向量和落在边缘的新训练数据重新训练 SVM。

当然，与对所有数据进行批量训练相比，结果可能会略有不同，因为某些点可能会被丢弃，这些点稍后会成为支持向量。再说一次，为什么要对 SVM 进行在线培训？

python - Any python Support Vector Machine library around that allows online learning?

5 回答 5

Related

Reference