0

我正在尝试按个人资料中至少有两个国家或来自美国的用户来过滤用户,我在 Pig 中尝试过

    B = group A by userid;
    C = foreach B  {
                count = $1.country;
                count2 = distinct count;
                GENERATE (((SIZE(count2) > 1 OR count2.$0 != 'USA') ? group : null)));
        }

但它伴随着这个错误

incompatible types in NotEqual Operator left hand side:bag :tuple(country:chararray)  right hand side:chararray

我尝试了各种其他组合,但没有运气。

4

1 回答 1

2

试试这个:

C =
    foreach (group A by userid)
    generate
        group as userid,
        COUNT(A) AS count,
        FLATTEN(A) as country;
D = filter C by count > 1 OR country == 'US';

C 是与模式 {userid:chararray, count:long, country:chararray} 的关系,其中 count 是与 userid 关联的国家/地区的数量。D 根据您的标准进行过滤。

于 2012-11-20T18:17:45.870 回答