0

我试图了解训练和测试数据的 vowpal_wabbit 数据结构,但似乎无法理解它们。

我有一些训练数据,比如。

特征 1:0 特征 2:1 特征 3:10 特征 4:5 类别标签:A

特征 1:0 特征 2:2 特征 3:30 特征 4:8 类别标签:C

特征 1:2 特征 2:10 特征 3:9 特征 4:7 类别标签:B

我已经基于这个网站探索了一些训练数据示例。

http://hunch.net/~vw/validate.html

我的验证数据

1 | haha:1 hehe:2 hoho:3
1 | haha:2 hehe:2 hoho:3 
3 | haha:3 hehe:2 hoho:3 
1 | haha:4 hehe:2 hoho:3 
2 | haha:5 hehe:2 hoho:3  

但是,我不明白为什么它声称我分别有 4 个和 5 个功能。

验证:

验证反馈

Total of 5 examples pasted.

(example #1) Example “1 | haha:1 hehe:2 hoho:3”.
(example #1) Found “[label] |…” prefix format.
(example #1) Example label / response / class is “1”.
(example #1) Example has default “1.0” importance weight.
(example #1) Example has default “0” base.
(example #1, namespace #1) Using default namespace.
(example #1, namespace #1) Found 3 feature(s).
(example #1, namespace #1, feature #1) Label “haha”.
(example #1, namespace #1, feature #1) Value “1”.
(example #1, namespace #1, feature #2) Label “hehe”.
(example #1, namespace #1, feature #2) Value “2”.
(example #1, namespace #1, feature #3) Label “hoho”.
(example #1, namespace #1, feature #3) Value “3”.

(example #2) Example “1 | haha:2 hehe:2 hoho:3 ”.
(example #2) Found “[label] |…” prefix format.
(example #2) Example label / response / class is “1”.
(example #2) Example has default “1.0” importance weight.
(example #2) Example has default “0” base.
(example #2, namespace #1) Using default namespace.
(example #2, namespace #1) Found 4 feature(s).
(example #2, namespace #1, feature #1) Label “haha”.
(example #2, namespace #1, feature #1) Value “2”.
(example #2, namespace #1, feature #2) Label “hehe”.
(example #2, namespace #1, feature #2) Value “2”.
(example #2, namespace #1, feature #3) Label “hoho”.
(example #2, namespace #1, feature #3) Value “3”.
(example #2, namespace #1, feature #4) Label “”.
(example #2, namespace #1, feature #4) Using default value of “1” for feature.

(example #3) Example “3 | haha:3 hehe:2 hoho:3 ”.
(example #3) Found “[label] |…” prefix format.
(example #3) Example label / response / class is “3”.
(example #3) Example has default “1.0” importance weight.
(example #3) Example has default “0” base.
(example #3, namespace #1) Using default namespace.
(example #3, namespace #1) Found 4 feature(s).
(example #3, namespace #1, feature #1) Label “haha”.
(example #3, namespace #1, feature #1) Value “3”.
(example #3, namespace #1, feature #2) Label “hehe”.
(example #3, namespace #1, feature #2) Value “2”.
(example #3, namespace #1, feature #3) Label “hoho”.
(example #3, namespace #1, feature #3) Value “3”.
(example #3, namespace #1, feature #4) Label “”.
(example #3, namespace #1, feature #4) Using default value of “1” for feature.

(example #4) Example “1 | haha:4 hehe:2 hoho:3 ”.
(example #4) Found “[label] |…” prefix format.
(example #4) Example label / response / class is “1”.
(example #4) Example has default “1.0” importance weight.
(example #4) Example has default “0” base.
(example #4, namespace #1) Using default namespace.
(example #4, namespace #1) Found 4 feature(s).
(example #4, namespace #1, feature #1) Label “haha”.
(example #4, namespace #1, feature #1) Value “4”.
(example #4, namespace #1, feature #2) Label “hehe”.
(example #4, namespace #1, feature #2) Value “2”.
(example #4, namespace #1, feature #3) Label “hoho”.
(example #4, namespace #1, feature #3) Value “3”.
(example #4, namespace #1, feature #4) Label “”.
(example #4, namespace #1, feature #4) Using default value of “1” for feature.

(example #5) Example “2 | haha:5 hehe:2 hoho:3 ”.
(example #5) Found “[label] |…” prefix format.
(example #5) Example label / response / class is “2”.
(example #5) Example has default “1.0” importance weight.
(example #5) Example has default “0” base.
(example #5, namespace #1) Using default namespace.
(example #5, namespace #1) Found 5 feature(s).
(example #5, namespace #1, feature #1) Label “haha”.
(example #5, namespace #1, feature #1) Value “5”.
(example #5, namespace #1, feature #2) Label “hehe”.
(example #5, namespace #1, feature #2) Value “2”.
(example #5, namespace #1, feature #3) Label “hoho”.
(example #5, namespace #1, feature #3) Value “3”.
(example #5, namespace #1, feature #4) Label “”.
(example #5, namespace #1, feature #4) Using default value of “1” for feature.
(example #5, namespace #1, feature #5) Label “”.
(example #5, namespace #1, feature #5) Using default value of “1” for feature.
4

1 回答 1

2

为什么它声称我分别有 4 个和 5 个功能

行尾的额外空格符号被http://hunch.net/~vw/validate.html解释为额外功能。(是的,示例中的最后一行有两个额外的空格。)请注意, validate.html 报告了额外功能的空名称:

(example #4, namespace #1, feature #4) Label “”.

请注意,validate.html是在 JavaScript 中实现的,并且完全独立于 VW 本身的实现(在 C 中)。VW 忽略尾随空格。您可以使用以下方法对其进行测试:

$ vw -P 1 < sample.data
...   
average    since         example     example  current  current  current
loss       last          counter      weight    label  predict features
1.000000   1.000000          1      1.0     1.0000   0.0000        4
0.522042   0.044084          2      2.0     1.0000   0.7900        4
1.838150   4.470366          3      3.0     3.0000   0.8857        4
1.488676   0.440255          4      4.0     1.0000   1.6635        4
1.270585   0.398217          5      5.0     2.0000   1.3690        4

因此,所有五个示例都被报告为具有 4 个特征(请参见最后一列)。为什么是四个?自动添加了一个额外的常量(拦截)功能。如果你不想要它,你可以使用vw --noconstant.

于 2015-03-05T19:13:42.633 回答