1

我需要将字段值从一行传播到另一种给定类型的记录,例如我的原始输入是

1,firefox,p  
1,,q
1,,r
1,,s
2,ie,p
2,,s
3,chrome,p
3,,r
3,,s
4,netscape,p

想要的结果

1,firefox,p  
1,firefox,q
1,firefox,r
1,firefox,s
2,ie,p
2,ie,s
3,chrome,p
3,chrome,r
3,chrome,s
4,netscape,p

我试过

A = LOAD 'file1.txt' using PigStorage(',') AS (id:int,browser:chararray,type:chararray);
SPLIT A INTO B IF (type =='p'), C IF (type!='p' );
joined =  JOIN B BY id FULL, C BY id;
joinedFields = FOREACH joined GENERATE  B::id,  B::type, B::browser, C::id, C::type;
dump joinedFields;

我得到的结果是

(,,,1,p  )
(,,,1,q)
(,,,1,r)
(,,,1,s)
(2,p,ie,2,s)
(3,p,chrome,3,r)
(3,p,chrome,3,s)
(4,p,netscape,,)

任何帮助表示赞赏,谢谢。

4

1 回答 1

2

PIG 不完全是 SQL,它在构建时考虑了数据流、MapReduce 和组(也存在连接)。您可以使用嵌套在 FOREACH 和 FLATTEN 中的 GROUP BY、FILTER 获得结果。

inpt = LOAD 'file1.txt' using PigStorage(',') AS (id:int,browser:chararray,type:chararray);
grp = GROUP inpt BY id;
Result = FOREACH grp {
    P = FILTER inpt BY type == 'p'; -- leave the record that contain p for the id
    PL = LIMIT P 1; -- make sure there is just one
    GENERATE FLATTEN(inpt.(id,type)), FLATTEN(PL.browser); -- convert bags produced by group by back to rows
}; 
于 2012-09-27T20:42:09.817 回答