1

在处理 libsvm 时,我遇到了一个非常具有挑战性的问题。当我在 libsvm 中测试我的数据时,准确度是荒谬的(1%)。我不知道它是正常准确度还是我做错了什么,但是当我执行 easy.py 脚本时,当 svm-scale 脚本执行时,会出现多次以下警告。

WARNING: feature index i appeared in test.libsvm was not seen in the scaling factor file train.libsvm.range.

如何解决此警告?修复会提高我的准确性吗?

编辑: train.libsvm.range 的内容如下:

x
-1 1
2 -1 0
3 -1 0
4 1 2
5 -1 0
6 -1 0
7 -1 0
8 0 1
9 0 1
10 -1 0
11 0 1
12 2 3
13 -1 0
14 -1 0
15 -1 0
16 0 2
17 -1 0
18 -2 0
19 -2 0
20 0 1
21 0 2
23 0 1
24 2 3
25 0 1
26 -1 0
27 -1 0
28 1 2
29 -1 0
30 -1 0
31 -1 0
32 0 2
36 0 1

编辑:在这里你可以看到训练文件测试文件

4

1 回答 1

1

发生这种情况是因为您的测试数据中有一些不在用于生成缩放文件的训练数据中的特征。检查您的训练和测试数据集是否匹配。如果您的测试数据(或训练数据)来自错误的数据集,那么获取正确的数据可能会解决问题

例如,在您的污染数据文件中,功能编号 3 始终为零,因此它不会包含在您的测试文件的第 5626 行中:

-1 1:0 2:-1 3:-1 4:2 5:-1 6:-1 7:-1 8:0 9:0 10:0 11:0 12:2 13:0 14:-1

由于功能 3 在测试文件中有一个值,但不在比例因子文件中,您会收到错误消息。

我不确定您发布的 train.libsvm.range 的内容来自哪里,因为如果我从测试日期生成它,我会得到:

 x
 -1 1
 2 -1 0
 4 0 2   ** note 3 is missing **
 5 -1 0
 6 -1 0
 7 -1 0
 8 0 1
 9 0 1
 12 0 3    ** note 10, 11 are missing **
 etc.

检查您是否使用了正确的测试和训练数据。

另一件事,运行 easy.py 我得到 65% 的准确率而不是 1%:

    $ ./easy.py train_libsvm.mht test_sdx.mht
    Scaling training data...
    WARNING: original #nonzeros 7560
             new      #nonzeros 15748
    Use -l 0 if many original feature values are zeros
    Cross validation...
    Best c=512.0, g=0.0001220703125 CV rate=70.0
    Training...
    Output model: train_libsvm.mht.model
    Scaling testing data...
    WARNING: feature index 3 appeared in file test_sdx.mht was not seen in the scaling factor file train_libsvm.mht.range.
    WARNING: feature index 10 appeared in file test_sdx.mht was not seen in the scaling factor file train_libsvm.mht.range.
    WARNING: feature index 11 appeared in file test_sdx.mht was not seen in the scaling factor file train_libsvm.mht.range.
    WARNING: feature index 13 appeared in file test_sdx.mht was not seen in the scaling factor file train_libsvm.mht.range.
    WARNING: feature index 22 appeared in file test_sdx.mht was not seen in the scaling factor file train_libsvm.mht.range.
    WARNING: feature index 25 appeared in file test_sdx.mht was not seen in the scaling factor file train_libsvm.mht.range.
    WARNING: original #nonzeros 67740
             new      #nonzeros 169332
    Use -l 0 if many original feature values are zeros
    Testing...
    Accuracy = 65.651% (3706/5645) (classification)
于 2013-11-08T08:49:38.360 回答