0

我从一个带有 XML SerDe 的 XML 文件创建一个带有 HIVE (Hive 2.1.1-mapr-1703) 的外部表。该文件是来自 W3C 联盟的XML 示例。

这是我创建表的代码:

add jar /mapr/localpath/hivexmlserde-1.0.5.3.jar;
USE my_db;
CREATE EXTERNAL TABLE frank_books (
category STRING,
title STRING,
language STRING,
year BIGINT
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.category" = "/book/@category",
"column.xpath.title"    = "/book/title/text()",
"column.xpath.language" = "/book/title/@lang",
"column.xpath.year"     = "/book/year/text()"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/mapr/localpath/database_files/xml_example'
TBLPROPERTIES (
"xmlinput.start" = "<book category",
"xmlinput.stop" = "</book>"
)

表本身存在是因为 describe 语句不会导致错误:

describe frank_books;

如下所示的简单选择语句会导致NullPointerException

select * from my_db.frank_books;

这是输出:

OK
Failed with exception java.io.IOException:java.lang.NullPointerException
Time taken: 1.117 seconds

谁能帮忙,请向我解释错误?

谢谢,弗兰克

4

1 回答 1

0

可能是 MapR 特定的东西吗?

hive> DROP TABLE IF EXISTS xml_45158949;
OK
Time taken: 0.977 seconds
hive> 
    > CREATE  TABLE xml_45158949(
    > category STRING,
    > title STRING,
    > language STRING,
    > year BIGINT
    > )
    > ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
    > WITH SERDEPROPERTIES(
    > "column.xpath.category" = "/book/@category",
    > "column.xpath.title"    = "/book/title/text()",
    > "column.xpath.language" = "/book/title/@lang",
    > "column.xpath.year"     = "/book/year/text()"
    >   )
    > STORED AS 
    > INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' 
    > OUTPUTFORMAT    'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' 
    > TBLPROPERTIES (
    > "xmlinput.start"="<book category",
    > "xmlinput.end"="</book>"
    > );
 OK
 Time taken: 0.243 seconds
 hive> 
  > load data local inpath '/Users/dvasilen/Misc/XML/45158949.xml'        OVERWRITE into table xml_45158949;
 Loading data to table default.xml_45158949
 OK
 Time taken: 0.153 seconds
 hive> 
  > select * from xml_45158949;
  OK
 cooking     Everyday Italian   en  2005
 children   Harry Potter    en  2005
 web     XQuery Kick Start  en  2003
 web     Learning XML   en  2003
 Time taken: 0.08 seconds, Fetched: 4 row(s)
 hive> 

似乎对我有用。

于 2017-07-18T23:21:50.013 回答