1

我对 HiveQL 很陌生,我有点卡住了:S

我有一个以 xml 格式存储的数据,我想从此 xml 文件中的 Hive 列表中提取字段 (字符串 Titles_2 , sting Artists_2,字符串 Albums_2) 。

xml数据示例:

<?xml version="1.0" encoding="UTF-8"?><MC><SC><S uid="2" gen="" yr="2011" art="Samsung" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/Samsung/Music" alb="Samsung" ttl="Over the horizon"/><S uid="37" gen="" yr="2010" art="Jason Derulo" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/Music/Jason Derulo/Jason Derulo" alb="Jason Derulo" ttl="Whatcha Say"/><S uid="38" gen="" yr="2010" art="Jason Derulo" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/Music/Jason Derulo/Jason Derulo" alb="Jason Derulo" ttl="In My Head"/><S uid="39" gen="" yr="2011" art="Alexandra Stan" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/Music/Alexandra Stan/Mr_ Saxobeat - Single" alb="Mr. Saxobeat - Single" ttl="Mr. Saxobeat (Extended Version)"/><S uid="40" gen="" yr="2011" art="Bushido" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/Music/Bushido/Jenseits von Gut und Böse (Premium Edition)" alb="Jenseits von Gut und Böse (Premium Edition)" ttl="Wie ein Löwe"/><S uid="41" gen="" yr="2011" art="Bushido" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/Music/Bushido/Jenseits von Gut und Böse (Premium Edition)" alb="Jenseits von Gut und Böse (Premium Edition)" ttl="Verreckt"/><S uid="42" gen="" yr="2011" art="Lucenzo" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/Music/Lucenzo/Danza Kuduro (feat_ Don Omar) [From _Fast &amp; Furious 5_] - Single" alb="Danza Kuduro (feat. Don Omar) [From &quot;Fast &amp; Furious 5&quot;] - Single" ttl="Danza Kuduro (feat. Don Omar) [From &quot;Fast &amp; Furious 5&quot;]"/><S uid="121" gen="" yr="701" art="Michael Jackson" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/external_sd/Music/Michael Jackson/Bad [Bonus Tracks]" alb="Bad [Bonus Tracks]" ttl="Voice-Over Intro/Quincy Jones Interview #1 [*]"/></SC><PC/></MC>

此数据存储在名为 xmlout_2(line) 的表中。

现在我运行了这些 xpath 命令来构建 HiveQL 表 Stores,但它只添加了每行的第一首歌曲。知道为什么会这样吗?

create view xmlout_2(line) as SELECT * from hivetesttable;

    CREATE VIEW Stores(Titles_2,  Artists_2, Albums_2) AS
    SELECT 
    xpath_string ( line, '/MC/SC/*/@ttl'),
    xpath_string (line, 'MC/SC/*/@art'),
    xpath_string (line, '/MC/SC/*/@alb')
    FROM  xmlout_2;

如果我尝试使用 xpath 而不是 xpath_string 我得到一个字符串数组而不是字符串。

create view xmlout_2(line) as SELECT * from hivetesttable;

    CREATE VIEW Stores(Titles_2,  Artists_2, Albums_2) AS
    SELECT 
    xpath ( line, '/MC/SC/*/@ttl'),
    xpath (line, 'MC/SC/*/@art'),
    xpath (line, '/MC/SC/*/@alb')
    FROM  xmlout_2;

我正在考虑在那之后爆炸列,但爆炸只能在单个列上使用。

4

0 回答 0