我正在尝试将嵌套的 XML 数据加载到 Hive 中。样本数据如下...
<CustomerOrders>
<Customers>
<CustID>ALFKI</CustID>
<Orders>
<OrderID>10643</OrderID>
<CustomerID>ALFKI</CustomerID>
<OrderDate>1997-08-25</OrderDate>
</Orders>
<Orders>
<OrderID>10692</OrderID>
<CustomerID>ALFKI</CustomerID>
<OrderDate>1997-10-03</OrderDate>
</Orders>
<CompanyName>Alfreds Futterkiste</CompanyName>
</Customers>
<Customers>
<CustID>ANATR</CustID>
<Orders>
<OrderID>10308</OrderID>
<CustomerID>ANATR</CustomerID>
<OrderDate>1996-09-18</OrderDate>
</Orders>
<CompanyName>Ana Trujillo Emparedados y helados</CompanyName>
</Customers>
</CustomerOrders>
以下是我正在使用的命令:
CREATE TABLE CUSTOMERORDERS(
CustID STRING,
Orders ARRAY<STRUCT<OrderID:STRING,CustomerID:STRING,OrderDate:STRING>>,
CompanyName STRING)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.CustID"="/Customers/CustID/text()",
"column.xpath.Orders"="/Customers/Orders",
"column.xpath.OrderID"="/Customers/Orders/OrderID",
"column.xpath.CustomerID"="/Customers/Orders/CustomerID",
"column.xpath.OrderDate"="/Customers/Orders/OrderDate",
"column.xpath.CompanyName"="/Customers/CompanyName/text()")
STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES ("xmlinput.start"="<Customers>","xmlinput.end"= "</Customers>");
我得到的输出是:
hive> select * from customerorders;
OK
ALFKI [{"orderid":null,"customerid":null,"orderdate":null},{"orderid":null,"customerid":null,"orderdate":null}] Alfreds Futterkiste
ANATR [{"orderid":null,"customerid":null,"orderdate":null}] Ana Trujillo Emparedados y helados
Time taken: 0.039 seconds, Fetched: 2 row(s)
我得到,和的null
值。谁能帮我解决这个问题?OrderID
CustomerID
OrderDate
谢谢