3

在尝试通过 Spark 流(Kafka 直接 API)使用来自 Kafka 的消息时,我遇到了以下错误。这在使用 Spark 独立集群管理器时可以正常工作。我们刚刚切换到使用 Cloudera 5.7 使用 Yarn 来管理 Spark 集群并开始看到以下错误。

一些细节: - Spark 1.6.0 - 使用 Kafka 直接流 API - Kafka 代理版本 (0.8.2.1) - Yarn 执行器的类路径中的 Kafka 版本 (0.9) - 不受 Cloudera 管理的 Kafka 代理

我看到使用独立集群管理器和纱线之间的唯一区别是在消费者端使用的 Kafka 版本。(0.8.2.1 对 0.9)

试图弄清楚版本不匹配是否真的是一个问题?如果确实如此,除了将 Kafka 代理升级到 0.9 之外,还有什么解决方法。(最终是,但现在不是)

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 200.0 failed 4 times, most recent failure: Lost task 0.3 in stage 200.0 (TID 203,..): java.nio.BufferUnderflowException at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:151) at java.nio.ByteBuffer.get(ByteBuffer.java:715) at kafka.api.ApiUtils$.readShortString(ApiUtils.scala:40) at kafka.api.TopicData$.readFrom(FetchResponse.scala:96) at kafka.api.FetchResponse$$anonfun$4.apply(FetchResponse.scala:170) at kafka.api.FetchResponse$$anonfun$4.apply(FetchResponse.scala:169) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.immutable.Range.foreach(Range.scala:141) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) at kafka.api.FetchResponse$.readFrom(FetchResponse.scala:169) at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:135) at org.apache.spark.streaming.kafka.KafkaRDD$KafkaRDDIterator.fetchBatch(KafkaRDD.scala:192) at org.apache.spark.streaming.kafka.KafkaRDD$KafkaRDDIterator.getNext(KafkaRDD.scala:208) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:942) at org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:942) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1869) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1869) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

4

0 回答 0