2

我有一些 Spark 代码,我使用 Kryo 序列化。当没有服务器发生故障时,一切都运行良好,但是当服务器发生故障时,我会遇到一些大问题,因为它试图自行恢复。基本上,错误消息表明我的Article课程对服务器来说是未知的。

Job aborted due to stage failure: Task 29 in stage 4.0 failed 4 times, most recent failure: Lost task 29.3 in stage 4.0 (TID 316, DATANODE-3): com.esotericsoftware.kryo.KryoException: Unable to find class: $line50.$read$$iwC$$iwC$Article
        com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
        com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
        com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
        com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
        org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:133)
        org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
        org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
        org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:235)
        org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
        org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
        org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
        org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        org.apache.spark.scheduler.Task.run(Task.scala:54)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:745)

我真的很难理解我做错了什么......

我在我的地图之外声明这些类

case class Contrib ( contribType: Option[String], surname: Option[String], givenNames: Option[String], phone: Option[String], email: Option[String], fax: Option[String] )

// Class to hold references
case class Reference( idRef:Option[String], articleNameRef:Option[String], pmIDFrom: Option[Long], pmIDRef:Option[Long])

// Class to hold articles
case class Article(articleName:String, articleAbstract: Option[String],
                   pmID:Option[Long], doi:Option[String],
                   references: Iterator[Reference],
                   contribs: Iterator[Contrib],
                   keywords: List[String])

似乎有些执行者不再知道 anArticle是什么......我该如何解决这个问题?

谢谢,斯蒂芬

4

0 回答 0