classification - weka中新实例的分类

Question

在我们的训练集中，我们执行了特征选择（例如 CfsSubsetEval GreedyStepwise），然后使用分类器（例如 J48）对实例进行分类。我们已经保存了 Weka 创建的模型。

现在，我们要对新的 [未标记] 实例进行分类（在进行特征选择之前，它仍然具有训练集的原始属性数量）。我们是否正确假设我们应该在这组新的 [未标记] 实例中执行特征选择，以便我们可以使用保存的模型重新评估它（以使训练和测试集兼容）？如果是，我们如何过滤测试集？

感谢您的帮助！

score 0 · Accepted Answer

是的，测试集和训练集必须具有相同数量的属性，并且每个属性必须对应于相同的事物。因此，您应该在分类之前从测试集中删除相同的属性（从训练集中删除）。

score 0 · Accepted Answer

我认为您不必在测试集上执行特征选择。如果您的测试集已经有原始数量的属性，请上传它，并在“预处理”窗口中，手动删除训练集文件中特征选择期间删除的所有属性。

score 0 · Accepted Answer

您必须将之前应用于训练集的相同过滤器应用于测试集。您也可以使用 WEKA API 将相同的过滤器应用于测试集。

Instances trainSet = //get training set
Instances testSet = //get testing set
AttributeSelection attsel = new AttributeSelection();//apply feature selection on training data
CfsSubsetEval ws = new CfsSubsetEval();
GreedyStepwise search = new GreedyStepwise();
attsel.setEvaluator(ws);
attsel.setSearch(search);
attsel.SelectAttributes(trainSet);

retArr = attsel.selectedAttributes();//get indicies of selected attributes

Filter remove = new Remove() //set up the filter for removing attributes
remove.setAttributeIndicesArray(retArr);
remove.setInvertSelection(true);//retain the selected,remove all others
remove.setInputFormat(trainSet);
trainSet = Filter.useFilter(trainSet, remove);

//now apply the same filter to the testing set as well
testSet = Filter.useFilter(testSet, remove);

//now you are good to go!

classification - weka中新实例的分类

3 回答 3

Related

Reference