What parameters should I use in VW for a binary classification task? For example, let's use rcv1_small.dat. I thought it is better to use the logistic loss function (or hinge) and it makes no sense to use --oaa 2
. However, the empirical results (with progressive validation 0/1 loss reported in all 4 experiments) show that best combination is --oaa 2
without logistic loss (i.e. with the default squared loss):
cd vowpal_wabbit/test/train-sets
cat rcv1_small.dat | vw --binary
# average loss = 0.0861
cat rcv1_small.dat | vw --binary --loss_function=logistic
# average loss = 0.0909
cat rcv1_small.dat | sed 's/^-1/2/' | vw --oaa 2
# average loss = 0.0857
cat rcv1_small.dat | sed 's/^-1/2/' | vw --oaa 2 --loss_function=logistic
# average loss = 0.0934
My primary question is: Why --oaa 2
does not give exactly the same results as --binary
(in the above setting)?
My secondary questions are: Why optimizing the logistic loss does not improve the 0/1 loss (compared to optimizing the default square loss)? Is this a specific of this particular dataset?