我正在尝试将镶木地板文件读入 Spark 上的 Hive。
所以我发现我应该做一些这样的事情:
CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED
AS AVRO TBLPROPERTIES ('avro.schema.url'='/files/events/avro_events_scheme.avsc');
CREATE EXTERNAL TABLE parquet_test LIKE avro_test STORED AS PARQUET LOCATION '/files/events/parquet_events/';
我的 avro 方案是:
{
"type" : "parquet_file",
"namespace" : "events",
"name" : "events",
"fields" : [
{ "name" : "category" , "type" : "string" },
{ "name" : "duration" , "type" : "long" },
{ "name" : "name" , "type" : "string" },
{ "name" : "user_id" , "type" : "string"},
{ "name" : "value" , "type" : "long" }
]
}
结果我收到一个错误:
org.apache.spark.sql.catalyst.parser.ParseException:
Operation not allowed: ROW FORMAT SERDE is incompatible with format 'avro',
which also specifies a serde(line 1, pos 0)