我已经下载了文本分类模板的最新更新。我创建了一个新应用程序并通过指定应用程序 ID 导入了 stopwords.json 和 emails.json
$ pio import --appid <appID> --input data/stopwords.json
$ pio import --appid <appID> --input data/emails.json
然后我更改了 engine.json 并在其中给出了我的应用程序名称。
{
"id": "default",
"description": "Default settings",
"engineFactory": "org.template.textclassification.TextClassificationEngine",
"datasource": {
"params": {
"appName": "<myapp>",
"evalK": 3
}
但是下一步,即评估失败并出现错误empty.maxBy
。部分错误贴在下面
[INFO] [Engine$] Preparator: org.template.textclassification.Preparator@79a13920
[INFO] [Engine$] AlgorithmList: List(org.template.textclassification.LRAlgorithm@420a8042)
[INFO] [Engine$] Serving: org.template.textclassification.Serving@faea4da
Exception in thread "main" java.lang.UnsupportedOperationException: empty.maxBy
at scala.collection.TraversableOnce$class.maxBy(TraversableOnce.scala:223)
at scala.collection.AbstractTraversable.maxBy(Traversable.scala:105)
at org.template.textclassification.PreparedData.<init> (Preparator.scala:160)
at org.template.textclassification.Preparator.prepare(Preparator.scala:39)
at org.template.textclassification.Preparator.prepare(Preparator.scala:35)
at io.prediction.controller.PPreparator.prepareBase(PPreparator.scala:34)
at io.prediction.controller.Engine$$anonfun$25.apply(Engine.scala:758)
at scala.collection.MapLike$MappedValues.get(MapLike.scala:249)
at scala.collection.MapLike$MappedValues.get(MapLike.scala:249)
at scala.collection.MapLike$class.apply(MapLike.scala:140)
at scala.collection.AbstractMap.apply(Map.scala:58)
然后我尝试pio train
了,但在显示一些观察结果后训练也失败了。显示的错误是java.lang.OutOfMemoryError: Java heap space
。错误的一部分粘贴在下面。
[INFO] [Engine$] Data santiy check is on.
[INFO] [Engine$] org.template.textclassification.TrainingData supports data sanity check. Performing check.
Observation 1 label: 1.0
Observation 2 label: 0.0
Observation 3 label: 0.0
Observation 4 label: 1.0
Observation 5 label: 1.0
[INFO] [Engine$] org.template.textclassification.PreparedData does not support data sanity check. Skipping check.
[WARN] [BLAS] Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
[WARN] [BLAS] Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
[INFO] [Engine$] org.template.textclassification.NBModel does not support data sanity check. Skipping check.
[INFO] [Engine$] EngineWorkflow.train completed
[INFO] [Engine] engineInstanceId=AU3g4XyhTrUUakX3xepP
[INFO] [CoreWorkflow$] Inserting persistent model
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at com.esotericsoftware.kryo.io.Output.flush(Output.java:155)
at com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:36)
at com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:33)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
at com.twitter.chill.TraversableSerializer$$anonfun$write$1.apply(Traversable.scala:29)
这是因为内存不足吗?我已经使用大于 40mb 的文本分类数据运行了相同模板的先前版本,没有问题。评估是培训的必要条件吗?另外,您能否解释一下评估是如何进行的?