1

从下面的 Pig 代码中可以看出,我正在为 Attr1 和 Attr2 重复一组语句。有没有办法在函数中提取它?代码示例真的很有帮助。

Attr1ValidRecs = FILTER BaseRecs BY Attr1 IS NOT NULL;
Attr1ValidRecs_all = GROUP Attr1ValidRecs ALL;
Attr1Count = FOREACH Attr1ValidRecs_all GENERATE COUNT(Attr1ValidRecs);
Attr1CountStr = FOREACH Attr1Count GENERATE CONCAT('Recs with Attr1 not null : ',(chararray)$0);

Attr1BaseCross = CROSS BaseRecsCount,Attr1Count;
Attr1BaseRatio = FOREACH Attr1BaseCross GENERATE CONCAT('Ratio of Not Null Attr1 to Total Base Recs: ',(chararray)((double)$1/(double)$0));

Attr2ValidRecs = FILTER BaseRecs BY Attr2 IS NOT NULL;
Attr2ValidRecs_all = GROUP Attr2ValidRecs ALL;
Attr2Count = FOREACH Attr2ValidRecs_all GENERATE COUNT(Attr2ValidRecs);
Attr2CountStr = FOREACH Attr2Count GENERATE CONCAT('Recs with Attr2 not null : ',(chararray)$0);

Attr2BaseCross = CROSS BaseRecsCount,Attr2Count;
Attr2BaseRatio = FOREACH Attr2BaseCross GENERATE CONCAT('Ratio of Not Null Attr2 to Total Base Recs:
',(chararray)((double)$1/(double)$0));
4

1 回答 1

0

不幸的是,您不能将多行替换为一批 Pig 操作。这是我希望我有时可以做的事情,所以我很同情。

过去,当我在同一个脚本中一遍又一遍地重复某些内容时,我所做的是在 Python 脚本(或其他任何东西)中使用 for 循环生成 Pig Latin 代码,替换某些关键字。不过,这仍然感觉很脏。

于 2011-07-03T02:05:30.493 回答