我有一些 Spark 代码,我使用 Kryo 序列化。当没有服务器发生故障时,一切都运行良好,但是当服务器发生故障时,我会遇到一些大问题,因为它试图自行恢复。基本上,错误消息表明我的Article
课程对服务器来说是未知的。
Job aborted due to stage failure: Task 29 in stage 4.0 failed 4 times, most recent failure: Lost task 29.3 in stage 4.0 (TID 316, DATANODE-3): com.esotericsoftware.kryo.KryoException: Unable to find class: $line50.$read$$iwC$$iwC$Article
com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:133)
org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:235)
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
我真的很难理解我做错了什么......
我在我的地图之外声明这些类
case class Contrib ( contribType: Option[String], surname: Option[String], givenNames: Option[String], phone: Option[String], email: Option[String], fax: Option[String] )
// Class to hold references
case class Reference( idRef:Option[String], articleNameRef:Option[String], pmIDFrom: Option[Long], pmIDRef:Option[Long])
// Class to hold articles
case class Article(articleName:String, articleAbstract: Option[String],
pmID:Option[Long], doi:Option[String],
references: Iterator[Reference],
contribs: Iterator[Contrib],
keywords: List[String])
似乎有些执行者不再知道 anArticle
是什么......我该如何解决这个问题?
谢谢,斯蒂芬