我刚开始学习猪,我在重命名别名时遇到了问题。我想做的是读取一个文件,过滤它,然后自己加入它。我所做的是这样的:
register s3n://uw-cse-344-oregon.aws.amazon.com/myudfs.jar
raw = LOAD 's3n://uw-cse-344-oregon.aws.amazon.com/cse344-test-file' USING TextLoader as (line:chararray);
ntriples = foreach raw generate FLATTEN(myudfs.RDFSplit3(line)) as (subject:chararray,predicate:chararray,object:chararray);
ntriples2 = foreach raw generate FLATTEN(myudfs.RDFSplit3(line)) as (subject2:chararray,predicate2:chararray,object2:chararray);
X = FILTER ntriples BY (subject matches '.*business.*');
X2 = FILTER ntriples2 BY (subject2 matches '.*business.*');
joined= join X by subject, X2 by subject2;
joined = DISTINCT joined;
store joined into '/user/hadoop/join-results' using PigStorage();
如您所见,我两次读取并过滤文件,以便两个对每列有两个不同的别名。我怎样才能简单地复制过滤的集合并为其分配新的别名?这个手术原定需要 18 分钟,但需要 1.5 小时。