java - 在 java 中保存和加载训练有素的斯坦福分类器

Question

我有一个包含 100 万个标记句子的数据集，并使用它通过最大熵来寻找情绪。我正在使用斯坦福分类器：-

public class MaximumEntropy {

static ColumnDataClassifier cdc;

public static float calMaxEntropySentiment(String text) {
    initializeProperties();
    float sentiment = (getMaxEntropySentiment(text));
    return sentiment;
}

public static void initializeProperties() {
    cdc = new ColumnDataClassifier(
            "\\stanford-classifier-2016-10-31\\properties.prop");
}

public static int getMaxEntropySentiment(String tweet) {

    String filteredTweet = TwitterUtils.filterTweet(tweet);
    System.out.println("Reading training file");
    Classifier<String, String> cl = cdc.makeClassifier(cdc.readTrainingExamples(
            "\\stanford-classifier-2016-10-31\\labelled_sentences.txt"));

    Datum<String, String> d = cdc.makeDatumFromLine(filteredTweet);
    System.out.println(filteredTweet + "  ==>  " + cl.classOf(d) + " " + cl.scoresOf(d));
    // System.out.println("Class score is: " +
    // cl.scoresOf(d).getCount(cl.classOf(d)));
    if (cl.classOf(d) == "0") {
        return 0;
    } else {
        return 4;
    }
}
}

我的数据被标记为 0 或 1。现在，对于每条推文，整个数据集都被读取，考虑到数据集的大小，这需要花费大量时间。我的问题是有什么方法可以首先训练分类器，然后在找到推文的情绪时加载它。我认为这种方法将花费更少的时间。如果我错了，请纠正我。以下链接提供了这一点，但 JAVA API 没有任何内容。保存和加载分类器任何帮助将不胜感激。

score 2 · Accepted Answer

是的; 最简单的方法是使用 Java 的默认序列化机制来序列化分类器。一个有用的助手是IOUtils类：

IOUtils.writeObjectToFile(classifier, "/path/to/file");

要读取分类器：

Classifier<String, String> cl = IOUtils.readObjectFromFile(new File("/path/to/file");

java - 在 java 中保存和加载训练有素的斯坦福分类器

1 回答 1

Related

Reference