java - 使用 liblinear (java) 进行概率预测，直接在代码中使用分类器

Question

考虑以下 liblinear ( http://liblinear.bwaldvogel.de/ ) 的用法：

    double C = 1.0; // cost of constraints violation
    double eps = 0.01; // stopping criteria
    Parameter param = new Parameter(SolverType.L2R_L2LOSS_SVC, C, eps);
    Problem problem = new Problem();
    double[] GROUPS_ARRAY = {1, 0, 0, 0};
    problem.y = GROUPS_ARRAY;

    int NUM_OF_TS_EXAMPLES = 4;
    problem.l = NUM_OF_TS_EXAMPLES;
     problem.n = 2;

    FeatureNode[] instance1 = { new FeatureNode(1, 1), new FeatureNode(2, 1) };
    FeatureNode[] instance2 = { new FeatureNode(1, -1), new FeatureNode(2, 1) };
    FeatureNode[] instance3 = { new FeatureNode(1, -1), new FeatureNode(2, -1) };
    FeatureNode[] instance4 = { new FeatureNode(1, 1), new FeatureNode(2, -1) };

    FeatureNode[] instance5 = { new FeatureNode(1, 1), new FeatureNode(2, -0.1) };
    FeatureNode[] instance6 = { new FeatureNode(1, -0.1), new FeatureNode(2, 1) };
    FeatureNode[] instance7 = { new FeatureNode(1, -0.1), new FeatureNode(2, -0.1) };

    FeatureNode[][] testSetWithUnknown = {
            instance5,
            instance6, 
            instance7
        };

    FeatureNode[][] trainingSetWithUnknown = {
            instance1,
            instance2, 
            instance3, 
            instance4
        };

    problem.x = trainingSetWithUnknown;

    Model m = Linear.train(problem, param); 

    for( int i = 0; i < trainingSetWithUnknown.length; i++)
        System.out.println(" Train.instance =  " + i + " =>  " + Linear.predict(m, trainingSetWithUnknown[i]) ); 
    System.out.println("---------------------"); 
    for( int i = 0; i < testSetWithUnknown.length; i++)
        System.out.println(" Test.instance =  " + i + " =>  " + Linear.predict(m, testSetWithUnknown[i]) );

这是输出：

iter  1 act 1.778e+00 pre 1.778e+00 delta 6.285e-01 f 4.000e+00 |g| 5.657e+00 CG   1
 Train.instance =  0 =>  1.0
 Train.instance =  1 =>  0.0
 Train.instance =  2 =>  0.0
 Train.instance =  3 =>  0.0
---------------------
 Test.instance =  0 =>  1.0
 Test.instance =  1 =>  1.0
 Test.instance =  2 =>  0.0

我需要概率预测，而不是整数（硬）预测。命令行有一个选项 -b ，但我在代码中找不到直接使用该函数的任何内容。此外，查看代码内部（https://github.com/bwaldvogel/liblinear-java/blob/master/src/main/java/de/bwaldvogel/liblinear/Predict.java）；通过在代码中直接使用，显然没有概率预测的选项。那是对的吗？

更新：我最终使用了 liblinear 代码形式https://github.com/bwaldvogel/liblinear-java。在文件 Predict.java 我改变了

private static boolean       flag_predict_probability = true;

至

private static boolean       flag_predict_probability = false;

并使用

SolverType.L2R_LR

但仍然得到整数类。任何想法？

score 1 · Accepted Answer

要使用概率，需要更改代码。预测是在内部进行的

public static double predictValues(Model model, Feature[] x, double[] dec_values) {

Linear.java 文件中的函数：

    if (model.nr_class == 2) {
        System.out.println("Two classes "); 
        if (model.solverType.isSupportVectorRegression()) { 
            System.out.println("Support vector");
            return dec_values[0];
        }
        else { 
            System.out.println("Not Support vector");
            return (dec_values[0] > 0) ? model.label[0] : model.label[1];
        }

    }

需要改为

    if (model.nr_class == 2) {
        System.out.println("Two classes "); 
        if (model.solverType.isSupportVectorRegression()) { 
            System.out.println("Support vector");
            return dec_values[0];
        }
        else { 
            System.out.println("Not Support vector");
            return dec_values[0]; 
        }    
    }

请注意，输出仍然不是概率，而是权重和特征值的线性组合。如果你把它交给 softmax 函数，它将变成 [0, 1] 中的概率。

此外，请确保选择 Logistic 回归：

     SolverType.L2R_LR

java - 使用 liblinear (java) 进行概率预测，直接在代码中使用分类器

1 回答 1

Related

Reference