3

我有两个数据集:

A = {uid, url}; B = {uid, url};

现在我做一个cogroup

C = COGROUP A BY uid, B BY uid;

我想把 C 改成 { group AS uid, DISTINCT A.url+B.url};

我的问题是如何连接两个包 A.url 和 B.url?

或者换一种说法,我该如何DISTINCT处理多个列?

4

2 回答 2

0

它不可能是你所期望的,但这就是我从你的问题中理解的:

C = JOIN A BY uid, B BY uid;
D = DISTINCT C;

连接是通过以下方式完成的:

E = FOREACH D GENERATE CONCAT(A::uid,B::uid); 
于 2013-03-01T17:27:49.637 回答
0
A = LOAD 'A' using PigStorage() as (uid,url);
B = LOAD 'B' using PigStorage() as (uid,url);
C = JOIN A by uid ,B by uid;
D = FOREACH C GENERATE $0,CONCAT(A::url,B::url);
E= DISTINCT D;
dump E;
于 2015-10-23T07:56:15.020 回答