hadoop - 在 Hive 中提取结构数组

Question

我在蜂巢中有一个外部表

CREATE EXTERNAL TABLE FOO (  
  TS string,  
  customerId string,  
  products array< struct <productCategory:string, productId:string> >  
)  
PARTITIONED BY (ds string)  
ROW FORMAT SERDE 'some.serde'  
WITH SERDEPROPERTIES ('error.ignore'='true')  
LOCATION 'some_locations'  
;

表的记录可能包含以下数据：

1340321132000, 'some_company', [{"productCategory":"footwear","productId":"nik3756"},{"productCategory":"eyewear","productId":"oak2449"}]

有谁知道是否有一种方法可以简单地从该记录中提取所有 productCategory 并将其作为 productCategories 数组返回而不使用explode。类似于以下内容：

["footwear", "eyewear"]

或者我是否需要编写自己的GenericUDF，如果是这样，我不太了解Java（一个Ruby人），有人可以给我一些提示吗？我从 Apache Hive 阅读了一些关于 UDF 的说明。但是，我不知道哪种集合类型最适合处理数组，以及哪种集合类型最适合处理结构？

===

我通过写一个 GenericUDF 回答了这个问题，但我遇到了另外两个问题。这是在这个SO Question

score 1 · Accepted Answer

您可以使用 json serde 或内置函数 get_json_object、json_tuple。

使用rcongiu 的 Hive-JSON SerDe，用法将是：

定义表：

CREATE TABLE complex_json (
DocId string,
Orders array<struct<ItemId:int, OrderDate:string>>)

将示例 json 加载到其中（此数据必须是单行的，这一点很重要）：

{"DocId":"ABC","Orders":[{"ItemId":1111,"OrderDate":"11/11/2012"},{"ItemId":2222,"OrderDate":"12/12/2012"}]}

然后获取订单 ID 很简单：

SELECT Orders.ItemId FROM complex_json LIMIT 100;

它将为您返回 id 列表：

项目编号 [1111,2222]

经证明可以在我的环境中返回正确的结果。完整清单：

add jar hdfs:///tmp/json-serde-1.3.6.jar;

CREATE TABLE complex_json (
  DocId string,
  Orders array<struct<ItemId:int, OrderDate:string>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';

LOAD DATA INPATH '/tmp/test.json' OVERWRITE INTO TABLE complex_json;

SELECT Orders.ItemId FROM complex_json LIMIT 100;

在这里阅读更多：

http://thornydev.blogspot.com/2013/07/querying-json-records-via-hive.html

score 1 · Accepted Answer

一种方法是使用inlineorexplode函数，如下所示：

SELECT 
    TS,
    customerId,
    pCat,
    pId,
FROM FOO 
LATERAL VIEW inline(products) p AS pCat, pId

否则你可以写UDF。查看这篇文章和这篇文章。连同以下资源：

score 0 · Accepted Answer

如果数组的大小是固定的（比如 2 ）。请试试：

products[0].productCategory,products[1].productCategory

但如果不是，UDF 应该是正确的解决方案。我想你可以在 JRuby 中做到这一点。GL！

hadoop - 在 Hive 中提取结构数组

3 回答 3

Related

Reference