2

我在 Pig 中有一个如下所示的关系:

([account_id#100,
 timestamp#1434,
 id#900],

[account_id#100,
 timestamp#1434,
 id#901],

[account_id#100,
 timestamp#1434,
 id#902])

如您所见,我在一个元组中有三个地图对象。上面的所有数据都在关系中的第 0 美元字段内。因此,上面的数据与单个 bytearray 列有关。

数据加载如下:

data = load 's3://data/data' using com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');

DESCRIBE data;

data: {bytearray}

如何将此数据结构拆分为三行,以便输出如下?

data: {account_id:chararray, timestamp:chararray, id:int}
(100, 1434,900)
(100, 1434,901)
(100, 1434,902)
4

1 回答 1

0

It is very difficult to guess your problem without having a sample input data. If this is an intermediate result, then write it out using a STORE and put the output file as something that we can input to try out. I was able to solve this using STRSPLIT but am not sure if you meant that the input is a single column and a single row or are these three different rows with the same column.

In either case, Flattening out the data using the FLATTEN operator and using STRSPLIT later should help. If I get more information and input data for the problem, I can give a working example.

Data -> FLATTEN to get out of bag -> STRSPLIT over "," in a FOREACH,GENERATE
于 2015-06-22T20:35:57.400 回答