1

抱歉,如果这已经被问过了。而不是原始预测 ( -r),我想通过设置为在 vowpal wabbit 中训练的 SVM 返回 [0, 1] 区间内的预测-loss_function hinge。目前我正在尝试这个,但它没有给我我想要的东西。有什么想法吗?

vw -d vw_train_rand.vw -c -f svm_rand.vw --passes 10 --loss_function hinge -q cn;

vw -d vw_test_rand.vw -t -i svm_rand.vw -p preds_rand_svm.txt

干杯

亚伦

编辑:

1)样本数据:

-1 |c Loan.TypeConventional:1 Loan.TypeFHA:0 Loan.TypeUnknown:0 Loan.TypeVA:0 |n Loan.Size:124500 LenderRank0612.0614:1939 ZipSquareMiles:53.1 MailDateMonth:5 ZipPerForeignBorn:11.4 ZipPerHighSchoolPlusDegree:57.2 ZipPerCollegePlusDegree:15.2 ZipPerVeterans:13.4 ZipPopPerSquareMile:798.1 ZipPerUnemployement:8.5 ZipSexRatio:96.7 ZipHousingUnitsPerSquareMile:315.1 ZipMedianHouseholdIncome:36238 ZipPerCapitaIncome:19085 MonthsDeedDatetoMailDate:2
-1 |c Loan.TypeConventional:1 Loan.TypeFHA:0 Loan.TypeUnknown:0 Loan.TypeVA:0 |n Loan.Size:232000 LenderRank0612.0614:391 ZipSquareMiles:99.1 MailDateMonth:5 ZipPerForeignBorn:11.8 ZipPerHighSchoolPlusDegree:73.3 ZipPerCollegePlusDegree:39.3 ZipPerVeterans:9.1 ZipPopPerSquareMile:485.5 ZipPerUnemployement:5.9 ZipSexRatio:98.5 ZipHousingUnitsPerSquareMile:169.6 ZipMedianHouseholdIncome:78465 ZipPerCapitaIncome:31908 MonthsDeedDatetoMailDate:3
-1 |c Loan.TypeConventional:1 Loan.TypeFHA:0 Loan.TypeUnknown:0 Loan.TypeVA:0 |n Loan.Size:90000 LenderRank0612.0614:130 ZipSquareMiles:32.6 MailDateMonth:5 ZipPerForeignBorn:51.5 ZipPerHighSchoolPlusDegree:60.7 ZipPerCollegePlusDegree:17.3 ZipPerVeterans:9.3 ZipPopPerSquareMile:783.2 ZipPerUnemployement:4.8 ZipSexRatio:97.2 ZipHousingUnitsPerSquareMile:274.2 ZipMedianHouseholdIncome:64668 ZipPerCapitaIncome:25632 MonthsDeedDatetoMailDate:3
-1 |c Loan.TypeConventional:0 Loan.TypeFHA:0 Loan.TypeUnknown:0 Loan.TypeVA:1 |n Loan.Size:121301 LenderRank0612.0614:23 ZipSquareMiles:6.8 MailDateMonth:5 ZipPerForeignBorn:14.9 ZipPerHighSchoolPlusDegree:63.9 ZipPerCollegePlusDegree:24.2 ZipPerVeterans:10 ZipPopPerSquareMile:5245.1 ZipPerUnemployement:7.1 ZipSexRatio:93.3 ZipHousingUnitsPerSquareMile:2001.6 ZipMedianHouseholdIncome:56398 ZipPerCapitaIncome:25815 MonthsDeedDatetoMailDate:2

2)我目前得到的:

-1.001968
-1.000737
-1.000441
-1.001823

3)我想看到的:在连续 [0, 1] 区间中的预测,这样每个条目都可以解释为与事件相关的预测概率,例如:

0.012
0.009
0.010
0.0085
4

1 回答 1

5

如果你想预测概率,你应该用 训练--loss_function=logistic和测试--link=logistic。铰链损失(在 SVM 中使用)导致最大边距分类器,不适合预测概率。

请注意,仅使用--loss_function=hinge不会从大众制造 SVM(没有内核)。如果您想要以在线方式训练具有径向基内核的支持向量机,请使用--kvsm --kernel=rbfvw --ksvm -h | grep -A9 KSVM有关更多参数,请参阅)。

于 2015-06-14T21:16:30.140 回答