0

我承认这个问题的标题不清楚。如果有人在阅读我的问题后可以改写它,那就太好了。

无论如何,我有一对作为单词 ID 的字段。现在我想用他们的文字替换它们。现在我正在做两个 join 和 foreach ,如下所示:

WordIDs = LOAD wordID.txt AS (wordID1:long, wordID2:long);
WordTexts = LOAD wordText.txt AS (wordID:long, wordText:chararray);

Join1 = JOIN WordIDs BY wordID1, WordTexts BY wordID;
Replaced1 = FOREACH Join1 GENERATE WordTexts::wordText As wordText1, WordIDs::wordID2;

Join2 = JOIN Replaced1 BY wordID2, WordTexts BY wordID;
Replaced2 = FOREACH Join2 GENERATE Replaced1::wordText1 As wordText1, WordTexts::wordText::wordText2;

有没有什么方法可以用更少的语句来做到这一点(比如一个连接而不是两个连接)?

4

1 回答 1

1

我认为您当前的代码将生成 2 个单独的 map reduce 作业,以避免它使用复制连接,它不会更改连接语句的数量,但只会使用一个 map 侧连接,只有一个 map reduce 作业。代码应该是这样的(我还没有运行它):

WordIDs = LOAD wordID.txt AS (wordID1:long, wordID2:long);
WordTexts = LOAD wordText.txt AS (wordID:long, wordText:chararray);

Join1 = JOIN WordIDs BY wordID1, WordTexts BY wordID USING 'replicated';
Join2 = JOIN Join1 BY wordID2, WordTexts BY wordID USING 'replicated';

Replaced = FOREACH Join2 GENERATE Join1::WordTexts::wordText As wordText1, Join2::wordTexts::wordText as wordText2;
于 2013-01-20T21:54:18.880 回答