4

我正在为我的数据训练 mahout 分类器,按照我发出的命令创建 mahout 模型

./bin/mahout seqdirectory -i /tmp/mahout-work-root/MyData-all -o /tmp/mahout-work-root/MyData-seq

./bin/mahout seq2sparse -i /tmp/mahout-work-root/MyData-seq -o /tmp/mahout-work-root/MyData-vectors -lnorm -nv -wt tfidf

./bin/mahout split -i /tmp/mahout-work-root/MyData-vectors/tfidf-vectors --trainingOutput /tmp/mahout-work-root/MyData-train-vectors --testOutput /tmp/mahout-work-root/MyData-test-vectors --randomSelectionPct 40 --overwrite --sequenceFiles -xm sequential

./bin/mahout trainnb -i /tmp/mahout-work-root/Mydata-train-vectors -el -o /tmp/mahout-work-root/model -li /tmp/mahout-work-root/labelindex -ow

当我尝试使用 trainnb 命令创建模型时,出现以下异常:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.mahout.classifier.naivebayes.BayesUtils.writeLabelIndex(BayesUtils.java:119) at org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob.createLabelIndex(TrainNaiveBayesJob.java:152)

这里可能是什么问题?

注意:此处提到的原始示例可以正常工作。

4

1 回答 1

0

我认为这可能是您如何放置培训文件的问题。文件应按以下方式组织:

MyData-全部

\类A

 -file1
 -file2
 -...

\类B

 -filex

……

于 2013-01-19T07:55:19.340 回答