我们有 kakfa hdfs 连接器以默认的 avro 格式写入 hdfs。一个样本o / p:
对象^A^B^Vavro.schema"["null","string"]^@$ͳø{<9d>¾Ã^X:<8d>uV^K^H5^F°^F^B<8a>^ B{"severity":"notice","message":"测试消息","facility":"kern","syslog-tag":"sawmill_test:","timestamp":"2017-01-31T20:15 :00+00:00"}^B<8a>^B{"severity":"notice","message":"Test message","facility":"kern","syslog-tag":"sawmill_test: ","timestamp":"2017-01-31T20:15:00+00:00"}^B<8a>^B{"severity":"notice","message":"测试消息","facility" :"kern","syslog-tag":"sawmill_test:","timestamp":"2017-01-31T20:15:00+00:00"}$ͳø{<9d>¾×X:<8d>uV^K^H5
尝试阅读使用:
import com.databricks.spark.avro._
val df = spark.read.avro("..path to avro file")
我们得到以下错误
java.lang.RuntimeException:Avro 架构无法转换为 org.apache 的 com.databricks.spark.avro.DefaultSource.inferSchema(DefaultSource.scala:93) 的 Spark SQL StructType:[“null”,“string”]。 spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184) at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:183) 的 scala.Option.orElse(Option.scala:289)在 org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387) 在 org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) 在 org.apache.spark.sql .DataFrameReader。在 com.databricks.spark.avro.package$AvroDataFrameReader$$anonfun$avro$2.apply(package.scala:34) 在 com.databricks.spark.avro.package$AvroDataFrameReader$$anonfun 加载(DataFrameReader.scala:135) $avro$2.apply(package.scala:34)
请帮忙
火花版本:2.11
Spark-avro 版本:2.11-3.2.0
卡夫卡版本:0.10.2.1