python - 最大熵（maxent）分类器的可重现结果

Question

我正在尝试更新基线代码nltk.classify.rte_classify以添加更多功能，以提高模型的准确性。它使用 MaxentClassifier。我的问题是，每次执行代码时，我都会得到不同的准确度结果（在代码之后提到。）。通常，对于 scikit-learn 分类器，我们有参数'random_state'来获得可重现的结果。在我的情况下，我想对 MaxentClassifier 做同样的事情。我检查了他们的文档，但我找不到任何与random_statescikit 分类器类似的东西。

from nltk.classify.util import accuracy
import nltk.classify.rte_classify as classify
def rte_classifier(algorithm):
    from nltk.corpus import rte as rte_corpus
    train_set = rte_corpus.pairs(['rte1_dev.xml', 'rte2_dev.xml', 'rte3_dev.xml'])        
    test_set = rte_corpus.pairs(['rte1_test.xml'])
    featurized_train_set = classify.rte_featurize(train_set)
    featurized_test_set = classify.rte_featurize(test_set)
    # Train the classifier
    print('Training classifier...')
    if algorithm in ['GIS', 'IIS']:  # Use default GIS/IIS MaxEnt algorithm
       clf = nltk.MaxentClassifier.train(featurized_train_set, algorithm)
    else:
        err_msg = str(
        "RTEClassifier only supports these algorithms:\n "
        " 'GIS', 'IIS'.\n")
        raise Exception(err_msg)
    print('Testing classifier...')
    acc = accuracy(clf, featurized_test_set)
    print('Accuracy: %6.4f' % acc)
    return clf
rte_classifier('GIS')

第一次：准确度：0.5929
第二次：精度：0.5908
第三次：精度：0.5854
第 4 次：准确度：0.5913

测试集的准确度变化可能看起来更小，但在我自己的具有大量特征的数据集中，差异有时会达到 10%。

python - 最大熵（maxent）分类器的可重现结果

0 回答 0

Related

Reference