java - Mahout 中的 IndexOutOfBoundsException

Question

我正在尝试在 CSV 文件上运行 mahout SGD 分类器，但出现此错误 -

 
[vineet@localhost bin]$ ./mahout trainlogistic --input ./filtered.csv --output model --target target --categories 33 \
--features 200 --passes 10 --predictors subject --types text --rate 50

hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 6, Size: 4
        at java.util.ArrayList.rangeCheck(ArrayList.java:604)
        at java.util.ArrayList.get(ArrayList.java:382)
        at org.apache.mahout.classifier.sgd.CsvRecordFactory.processLine(CsvRecordFactory.java:245)
        at org.apache.mahout.classifier.sgd.TrainLogistic.mainToOutput(TrainLogistic.java:85)
        at org.apache.mahout.classifier.sgd.TrainLogistic.main(TrainLogistic.java:65)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)

CSV 文件包含 unicode 文本和用引号字符括起来的大文本字段。

我已经在示例 donut.csv 上尝试了分类器，它工作正常。我还尝试更改我的标题行，使其像“id”、“subject”、“field2”等，但它仍然不起作用。

我究竟做错了什么？

score 1 · Accepted Answer

有些行可能很脏 - 只有 4 个属性而不是 6 个。再次检查您的数据或尝试仅提供一行数据以验证我的猜测。

java - Mahout 中的 IndexOutOfBoundsException

1 回答 1

Related

Reference