1

以下代码没有准确返回我要计算的内容;唯一用户的数量。任何的想法?

data = LOAD 'input_initial' AS (user_id,item_id,rating,timestamp);
data = FOREACH data GENERATE user_id,item_id;
STORE data INTO 'input_final';
data_users = FOREACH data GENERATE user_id;
group_users = GROUP data_users BY user_id;
count_users = FOREACH group_users GENERATE COUNT(data_users);
STORE count_users INTO 'count_users';
4

1 回答 1

3

您需要修改最终的 GROUP 操作以作用于“全部”而不是单个字段:

group_users = GROUP data_users BY user_id;
grp_all = GROUP group_users ALL;
count_users = FOREACH grp_all GENERATE COUNT(group_users);
于 2013-02-06T11:55:08.083 回答