我想用木兰对一些数据进行分类。但我得到一个例外:
mulan.data.DataLoadException: Error creating Instances data from supplied Reader data source
at mulan.data.MultiLabelInstances.loadInstances(MultiLabelInstances.java:469)
at mulan.data.MultiLabelInstances.loadInstances(MultiLabelInstances.java:458)
at mulan.data.MultiLabelInstances.<init>(MultiLabelInstances.java:168)
主要功能来自 mulan.examples.TrainTestExperiment
public class TrainTestExperiment {
public static void main(String[] args) {
try {
String path = Utils.getOption("path", args); // e.g. -path dataset/
String filestem = Utils.getOption("filestem", args); // e.g. -filestem emotions
String percentage = Utils.getOption("percentage", args); // e.g. -percentage 50 (for 50%)
System.out.println("Loading the dataset");
MultiLabelInstances mlDataSet = new MultiLabelInstances(path + filestem + ".arff", path + filestem + ".xml");
// split the data set into train and test
Instances dataSet = mlDataSet.getDataSet();
RemovePercentage rmvp = new RemovePercentage();
rmvp.setInvertSelection(true);
rmvp.setPercentage(Double.parseDouble(percentage));
rmvp.setInputFormat(dataSet);
Instances trainDataSet = Filter.useFilter(dataSet, rmvp);
rmvp = new RemovePercentage();
rmvp.setPercentage(Double.parseDouble(percentage));
rmvp.setInputFormat(dataSet);
Instances testDataSet = Filter.useFilter(dataSet, rmvp);
MultiLabelInstances train = new MultiLabelInstances(trainDataSet, path + filestem + ".xml");
MultiLabelInstances test = new MultiLabelInstances(testDataSet, path + filestem + ".xml");
Evaluator eval = new Evaluator();
Evaluation results;
Classifier brClassifier = new NaiveBayes();
BinaryRelevance br = new BinaryRelevance(brClassifier);
br.setDebug(true);
br.build(train);
results = eval.evaluate(br, test);
System.out.println(results);
} catch (Exception e) {
e.printStackTrace();
}
}
}
至于数据格式,我有一个称为标题的维度,有 160 个类别。
数据文件按照arff格式格式化。
有些文字是中文的。
任何帮助表示赞赏。
此致