由于问题是关于查找包中的最后一个元素,因此您可以使用以下适用于稍微不同的数据集的代码:
{"uid":"23423423423","payments":[{"timestamp":"2014-11-12 10:21","payment_id":1,"data":"payment 1 data"},{"timestamp":"2014-12-12 07:20","payment_id":2,"data":"payment 2 data"}]}
Pig 脚本如下所示:
data = LOAD '$INPUT'
USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json: map[]);
data = FOREACH data GENERATE
json#'uid' as uid:chararray,
json#'payments' as payments:bag{};
row = FOREACH data {
item = ORDER payments BY * DESC;
item = LIMIT item 1;
item = FOREACH item GENERATE $0 as arr:map[];
item = FOREACH item GENERATE
arr#'timestamp' as timestamp:chararray,
arr#'payment_id' as payment_id:int,
arr#'data' as data:chararray;
GENERATE uid, FLATTEN(item) as (timestamp, payment_id, data);
};
DUMP row;