2

我对 Camus 和 Hadoop 非常陌生,并且遇到了异常错误。我正在尝试将一些 avro 文件写入 hdfs,并不断收到以下错误块:

[EtlMultiOutputRecordWriter] - ExceptionWritable key: topic=_schemas partition=0leaderId=0 server= service= beginOffset=0 offset=0 msgSize=1024 server= checksum=0 time=1450371931447 value: java.lang.Exception
at com.linkedin.camus.etl.kafka.common.KafkaReader.getNext(KafkaReader.java:108)
at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.nextKeyValue(EtlRecordReader.java:232)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
... 14 more

我查看了第 108 行com.linkedin.camus.etl.kafka.common.KafkaReader.getNext,发现它是这样的:MessageAndOffset msgAndOffset = messageIter.next();

我正在使用io.confluent.camus.etl.kafka.coders.AvroMessageDecoder我的解码器和com.linkedin.camus.example.DummySchemaRegistry我的编码器。

在日志的末尾,我得到另一行,指示来自 hdfs 文件之一的错误:Error from file [hdfs://localhost:9000/user/username/exec/2015-12-17-17-05-25/errors-m-00000]. error-m-00000 文件包含一个有点可读的开头,但随后更改为无法辨认的字符串:

SEQ*com.linkedin.camus.etl.kafka.common.EtlKey5com.linkedin.camus.etl.kafka.common.ExceptionWritable*org.apache.hadoop.io.compress.DefaultCodec|Ò ∫±ß˝}pºHí$ò¸ ·:0 schemasQ∞ΔøÿxúïîÀN√0E7l‡+∫»¢lFMõ> á*êxU®™ËzÍmàc[ÆÕ„XÚÕÿqZ%@[ÿD±gÓô…¯∆üGœ¯Ç¿Q,·Úçë2ô'«hZL¿3ëSöXÿ5ê·ê„Sé‡ÇÖpÎS¬î4,...LËÕ¥Î{û}wFßáâ*M)>%&uZÑCfi“˚#rKÌÔ¡flÌu^Í%† B∂"Xa*•⁄0ÔQÕpùGzùidy&ñªkT...Å›Ô^≥-#0>›...ΔRG∫.ˇÅ¨«JÚ®sÃ≥Ö</em>¡\£Rîfi˚ßé<em>T≥D#%T8ãW® ÚµÌ∫4N˙©W∫©mst√—Ô嶥óhÓ$C~#S+Ñâ{ã ÇFL¡ßí⁄L´ÏíÙºÙΩ5wfÃjM¬∏_Äò5RØ£ Ë"Eeúÿëx{ÆÏ«{XW÷XM€O¨- C#É¡Òl•ù9§‰õö2ó:wɲ%Œ-N∫ˇbFXˆ∑:àá5fyQÑ'ö™:roõ1⁄5•≠≈˚yM0±ú?»ÃW◊.h≈I´êöNæ [û3

最后,根据时间报告,hadoop 作业似乎已经运行,但从未发生提交:

Job time (seconds):
   pre setup    1.0 (11%)
  get splits    1.0 (11%)
  hadoop job    4.0 (44%)
      commit    0.0 (0%)
Total: 0 minutes 9 seconds

任何帮助或在哪里寻找解决这个问题的想法将不胜感激。谢谢你。

4

0 回答 0