我尝试使用 avsc 规范文件在 avro 上创建配置单元表,并且需要重命名一些列。使用了别名,但似乎它不起作用。当我查询表时,这些列返回为空
Spark DATAFrame 保存数据
val data=Seq(("john","adams"),("john","smith"))
val columns = Seq("fname","lname")
import spark.sqlContext.implicits._
val df=data.toDF(columns:_*)
df.write.format("avro").save("/test")
AVSC 规范文件
{
"type" : "record",
"name" : "test",
"doc" : " import of test",
"fields" : [ {
"name" : "first_name",
"type" : [ "null", "string" ],
"default" : null,
"aliases" : [ "fname" ],
"columnName" : "fname",
"sqlType" : "12"
}, {
"name" : "last_name",
"type" : [ "null", "string" ],
"default" : null,
"aliases" : [ "lname" ],
"columnName" : "lname",
"sqlType" : "12"
} ],
"tableName" : "test"
}
外部蜂巢表
create external table test
STORED AS AVRO
LOCATION '/test'
TBLPROPERTIES ('avro.schema.url'='/test.avsc');
蜂巢查询
从测试中选择姓氏;
即使 avro 中有原始名称(即 lname)的数据,也返回 null