我已经尝试了猪几天,但我没有掌握它的窍门。我正在尝试做简单的练习任务,但无济于事。目标是创建一个记录,显示每个 ID 每年的最大运行次数。所以我开始了:
A = LOAD 'pig/input/Batting.csv' using PigStorage(',') as (ID:int, year:int, stint:chararray, team:chararray, league:chararray, games:int, games_bat:int, atbat:int, runs:int);
B = GROUP A by year;
C = FOREACH B generate group, MAX(A.runs) as maxruns;
我认为在这一点之前一切都很顺利,但是在以下情况下它完全搞砸了:
D = JOIN A by year, C by year;
E = FOREACH D generate group, D.(group, ID), maxruns;
store E into 'batting_result';
任何关于前进方向的提示或想法将不胜感激。