我正在寻找一种通过 Hive 获取以下 AVSC 文件内容并将嵌套模式“RENTALRECORDTYPE”外部化的方法,以实现模式重用。
{
"type": "record",
"name": "EMPLOYEE",
"namespace": "",
"doc": "EMPLOYEE is a person that works here",
"fields": [
{
"name": "RENTALRECORD",
"type": {
"type": "record",
"name": "RENTALRECORDTYPE",
"namespace": "",
"doc": "Rental record is a record that is kept on every item rented",
"fields": [
{
"name": "due_date",
"doc": "The date when item is due",
"type": "int"
}
]
}
},
{
"name": "hire_date",
"doc": "Employee date of hire",
"type": "int"
}
]
}
这种定义模式的方法效果很好。我能够发出以下 HiveQL 语句并成功创建表。
CREATE EXTERNAL TABLE employee
STORED AS AVRO
LOCATION '/user/dtom/store/data/employee'
TBLPROPERTIES ('avro.schema.url'='/user/dtom/store/schema/employee.avsc');
但是,我希望能够引用现有架构,而不是在多个架构中复制记录定义。例如,将生成两个 AVSC 文件,而不是单个模式文件。即rentalrecord.avsc 和employee.avsc。
出租记录.avsc
{
"type": "record",
"name": "RENTALRECORD",
"namespace": "",
"doc": "A record that is kept for every rental",
"fields": [
{
"name": "due_date",
"doc": "The date on which the rental is due back to the store",
"type": "int"
}
]
}
员工.avsc
{
"type": "record",
"name": "EMPLOYEE",
"namespace": "",
"doc": "EMPLOYEE is a person that works for the VIDEO STORE",
"fields": [
{
"name": "rentalrecord",
"doc": "A rental record is a record on every rental",
"type": "RENTALRECORD"
},
{
"name": "hire_date",
"doc": "Employee date of hire",
"type": "int"
}
]
}
在上述场景中,我们希望能够将RENTALRECORD模式定义外部化,并能够在employee.avsc和其他地方重用它。
尝试使用以下两个 HiveQL 语句导入架构时,它失败了……</p>
CREATE EXTERNAL TABLE rentalrecord
STORED AS AVRO
LOCATION '/user/dtom/store/data/rentalrecord'
TBLPROPERTIES ('avro.schema.url'='/user/dtom/store/schema /rentalrecord.avsc');
CREATE EXTERNAL TABLE employee
STORED AS AVRO
LOCATION '/user/dtom/store/data/employee'
TBLPROPERTIES ('avro.schema.url'='/user/dtom/store/schema/employee.avsc');
Rentalrecord.avsc 导入成功,但employee.avsc 在第一个字段定义上失败。“RENTALRECORD”类型的字段。Hive 输出以下错误...</p>
失败:执行错误,从 org.apache.hadoop.hive.ql.exec.DDLTask 返回代码 1。java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException 遇到异常确定模式。返回信号模式以指示问题:“RENTALRECORD”不是定义的名称。“rentalrecord”字段的类型必须是定义的名称或 {"type": ...} 表达式。)
我的研究告诉我,Avro 文件确实支持这种形式的模式重用。所以要么我错过了一些东西,要么这是 Hive 不支持的东西。
任何帮助将不胜感激。