我正在尝试使用带有 Hive 的 SerDe 加载以下 XML 内容:
<?xml version="1.0"?>
<RootTag xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.website.com/service">
<Code>123</Code>
<ParentElement>
<Entity>
<EntityId>A</EntityId>
<EntityCode i:nil="true"/>
</Entity>
<Entity>
<EntityId>M</EntityId>
<EntityCode i:nil="true"/>
</Entity>
</ParentElement>
</RootTag>
蜂巢表创建如下:
CREATE EXTERNAL TABLE database.mytable(
code String,
Entity array<struct<Entity:struct<EntityId:String,EntityCode:String>>>
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES
(
"column.xpath.Code" = "/RootTag/Code/text()",
"column.xpath.ParentElement" = "/RootTag/ParentElement"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/xml_content/'
TBLPROPERTIES ("xmlinput.start" = "<RootTag", "xmlinput.end" = "</RootTag>");
我有两个问题:
- 当我指定上面指定的“xmlinput.start”时,它不起作用。我必须手动删除“RootTage”旁边的内容“xmlns:i = ... /service”才能开始解析 xml。
- 尽管如此,“EntityCode”属性还有另一个问题。我收到错误消息:
Caused by: org.apache.hive.service.cli.HiveSQLException:
java.io.IOException:
org.apache.hadoop.hive.serde2.SerDeException:
java.lang.RuntimeException:
org.xml.sax.SAXParseException;
lineNumber: 41;
columnNumber: 33;
The prefix "i" for attribute "i:nil" associated with an element type "ParentCode" is not bound.
我究竟做错了什么?感谢您对此的建议和意见。