0

我正在使用 IMDB 数据库来查找评分最高的演员/女演员,并且在给定的年份中出演的电影最多。我正在尝试将演员数据集与他们的评分一起加入。然后过滤年份并根据最高评分和电影数量对数据进行排序。

joinedActorRating = JOIN ratings by movie, actors BY movie;
actorRating = FOREACH joinedActorRating GENERATE *;
actorsYear = FILTER actorRating BY(year MATCHES '2000');
groupedYear = GROUP actorsYear BY (year,rating,firstName,lastName);
aggregatedYear = FOREACH groupedYear GENERATE group, COUNT (actorsYear) AS movieCount;
unaggregatedYear = FOREACH aggregatedYear GENERATE FLATTEN(group) AS (year,rating,firstName,lastName);
sortRating = ORDER unaggregatedYear BY rating ASC, count ASC;
dump sortRating; 

编译器说第二行是“无效的字段投影”,但我不确定在加入两个数据集后如何访问年份字段。有谁知道如何解决这一问题?

4

1 回答 1

0

加入后,您需要将想要通过的字段投影到当前关系。

joinedActorRating = JOIN ratings by movie, actors BY movie;
actorRating = FOREACH joinedActorRating GENERATE ratings::movie as movie
    , ratings::rank as rank, ratings::year as year, actors::firstName as firstName
    , actors::lastName as lastName;

我不确定哪些列在哪个表中(除了电影在两个表中),因为你没有包括这两个表,所以我只是猜到了。您可以根据需要修改投影。

于 2015-12-07T23:09:35.837 回答