hive - 将 xml 数据加载到 hive 表中：org.apache.hadoop.hive.ql.metadata.HiveException

Question

我正在尝试将 XML 数据加载到 Hive 中，但出现错误：

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":""}

我使用的xml文件是：

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book>
  <id>11</id>
  <genre>Computer</genre>
  <price>44</price>
</book>
<book>
  <id>44</id>
  <genre>Fantasy</genre>
  <price>5</price>
</book>
</catalog>

我使用的蜂巢查询是：

1) Create TABLE xmltable(xmldata string) STORED AS TEXTFILE;
LOAD DATA lOCAL INPATH '/home/user/xmlfile.xml' OVERWRITE INTO TABLE xmltable;

2) CREATE VIEW xmlview (id,genre,price)
AS SELECT
xpath(xmldata, '/catalog[1]/book[1]/id'),
xpath(xmldata, '/catalog[1]/book[1]/genre'),
xpath(xmldata, '/catalog[1]/book[1]/price')
FROM xmltable;

3) CREATE TABLE xmlfinal AS SELECT * FROM xmlview;

4) SELECT * FROM xmlfinal WHERE id ='11

直到第二个查询一切都很好但是当我执行第三个查询时它给了我错误：

错误如下：

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
    at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error    while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:675)
    at org.apache.hadoop.hive.ql.exec

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

那么哪里出了问题？我也在使用正确的 xml 文件。

谢谢，什里

score 4 · Accepted Answer

在这里找到 Jar --> Brickhouse ,

此处的示例示例->示例

stackoverflow 中的类似示例 -这里

解决方案：

--Load xml data to table
DROP table xmltable;
Create TABLE xmltable(xmldata string) STORED AS TEXTFILE;
LOAD DATA lOCAL INPATH '/home/vijay/data-input.xml' OVERWRITE INTO TABLE xmltable;

-- check contents
SELECT * from xmltable;

-- create view
Drop view  MyxmlView;
CREATE VIEW MyxmlView(id, genre, price) AS
SELECT
 xpath(xmldata, 'catalog/book/id/text()'),
 xpath(xmldata, 'catalog/book/genre/text()'),
 xpath(xmldata, 'catalog/book/price/text()')
FROM xmltable;

-- check view
SELECT id, genre,price FROM MyxmlView;


ADD jar /home/vijay/brickhouse-0.7.0-SNAPSHOT.jar;  --Add brickhouse jar 

CREATE TEMPORARY FUNCTION array_index AS 'brickhouse.udf.collect.ArrayIndexUDF';
CREATE TEMPORARY FUNCTION numeric_range AS 'brickhouse.udf.collect.NumericRange';

SELECT 
   array_index( id, n ) as my_id,
   array_index( genre, n ) as my_genre,
   array_index( price, n ) as my_price
from MyxmlView
lateral view numeric_range( size( id )) MyxmlView as n;

输出：

hive > SELECT
     >    array_index( id, n ) as my_id,
     >    array_index( genre, n ) as my_genre,
     >    array_index( price, n ) as my_price
     > from MyxmlView
     > lateral view numeric_range( size( id )) MyxmlView as n;
Automatically selecting local only mode for query
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Execution log at: /tmp/vijay/.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2014-07-09 05:36:45,220 null map = 0%,  reduce = 0%
2014-07-09 05:36:48,226 null map = 100%,  reduce = 0%
Ended Job = job_local_0001
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
OK
my_id      my_genre      my_price
11      Computer        44
44      Fantasy 5

耗时：8.541 秒，获取时间：2 行

根据问题所有者的要求添加更多信息：

在此处输入图像描述

score 4 · Accepted Answer

错误原因：

1) case-1 : (你的情况) - xml 内容被逐行馈送到 hive 中。

输入xml：

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book>
  <id>11</id>
  <genre>Computer</genre>
  <price>44</price>
</book>
<book>
  <id>44</id>
  <genre>Fantasy</genre>
  <price>5</price>
</book>
</catalog>

签入蜂巢：

select count(*) from xmltable;  // return 13 rows - means each line in individual row with col xmldata

错误原因：

XML 被解读为 13 个不统一的部分。如此无效的 XML

2) case-2 : xml 内容应作为 singleString 提供给 hive - XpathUDF工作参考语法：所有函数都遵循以下形式：xpath_ (xml_string, xpath_expression_string).* source

输入.xml

<?xml version="1.0" encoding="UTF-8"?><catalog><book><id>11</id><genre>Computer</genre><price>44</price></book><book><id>44</id><genre>Fantasy</genre><price>5</price></book></catalog>

签入蜂巢：

select count(*) from xmltable; // returns 1 row - XML is properly read as complete XML.

方法：

xmldata   = <?xml version="1.0" encoding="UTF-8"?><catalog><book> ...... </catalog>

然后像这样应用你的 xpathUDF

select xpath(xmldata, 'xpath_expression_string' ) from xmltable

score 1 · Accepted Answer

然后按照以下步骤获得所需的解决方案，只需更改源数据即可

 <catalog><book><id>11</id><genre>Computer</genre><price>44</price></book></catalog>
<catalog><book><id>44</id><genre>Fantasy</genre><price>5</price></book></catalog>

现在尝试以下步骤：

select xpath(xmldata, '/catalog/book/id/text()')as id,
xpath(xmldata, '/catalog/book/genre/text()')as genre,
xpath(xmldata, '/catalog/book/price/text()')as price FROM xmltable;

现在你会得到这样的ans：

[“11”] [“计算机”] [“44”]

[“44”] [“幻想”] [“5”]

如果你应用 xapth_string、xpath_int、xpath_int udfs 你会得到类似

11 电脑 44

44 幻想 5。

谢谢

score 0 · Accepted Answer

首先尝试加载文件我的添加文件路径到文件，这将解决您的问题，因为它在我的情况下已解决

score 0 · Accepted Answer

Oracle XML Extensions for Hive 可用于像这样在 XML 上创建 Hive 表。 https://docs.oracle.com/cd/E54130_01/doc.26/e54142/oxh_hive.htm#BDCUG691

score 0 · Accepted Answer

还要确保 XML 文件的最后一个结束标记的末尾不包含任何空格。就我而言，源文件有一个，每当我将文件加载到配置单元时，我的结果表中都包含 NULLS。所以每当我应用一个 xpath 函数时，结果都会有一些 [] [] [] [] [] []

尽管 xpath_string 函数有效，但 xpath_double 和 xpath_int 函数从未有效。它一直抛出这个异常 -

Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"line":""}

hive - 将 xml 数据加载到 hive 表中：org.apache.hadoop.hive.ql.metadata.HiveException

6 回答 6

Related

Reference