0

我有以下 PIG 脚本:

A = LOAD 'text_a.txt' USING PigStorage();
B = LOAD 'text_b.txt' USING PigStorage();
SOMETHING = FILTER A $0 matches 'SOMETHING';
FOOBAR = FILTER A $0 matches 'FOOBAR';

SOMETHING_B = JOIN SOMETHING BY key, B BY $1;
FOOBAR_B = JOIN FOOBAR BY key, B BY $1;
TEMP = JOIN SOMETHING_B BY key, FOOBAR_B by key;
OUT = FOREACH TEMP GENERATE SOMETHING_B::$1 - FOOBAR_B::$1; 
dump OUT;

当此脚本运行时,看起来 A 和 B 中的数据从源读取了两次?有什么办法可以防止它被第二次读取吗?

4

1 回答 1

0

首先,只需在脚本末尾添加“EXPLAIN OUT”即可确定数据是否被读取两次。

看着你的脚本剂量看起来像 A,B 被调用了两次

于 2015-04-30T16:20:29.480 回答