我正在使用 IMDB 数据库来查找评分最高的演员/女演员,并且在给定的年份中出演的电影最多。我正在尝试将演员数据集与他们的评分一起加入。然后过滤年份并根据最高评分和电影数量对数据进行排序。
joinedActorRating = JOIN ratings by movie, actors BY movie;
actorRating = FOREACH joinedActorRating GENERATE *;
actorsYear = FILTER actorRating BY(year MATCHES '2000');
groupedYear = GROUP actorsYear BY (year,rating,firstName,lastName);
aggregatedYear = FOREACH groupedYear GENERATE group, COUNT (actorsYear) AS movieCount;
unaggregatedYear = FOREACH aggregatedYear GENERATE FLATTEN(group) AS (year,rating,firstName,lastName);
sortRating = ORDER unaggregatedYear BY rating ASC, count ASC;
dump sortRating;
编译器说第二行是“无效的字段投影”,但我不确定在加入两个数据集后如何访问年份字段。有谁知道如何解决这一问题?