0

我有两个要加入的表。table1 有idvalue列。
table2 有idcolor列。

final = join table1 by id, table2 by id;
dump final;

我收到了列是idvalueidcolor的表。但我想获得一个包含idvaluecolor等列的表。如何从此表中删除此重复的 id 列?

4

2 回答 2

0

执行最终的 PIG 脚本:

grunt> table1 = LOAD 'table1_input_path' USING PigStorage(',') as (id:int, value:int);
grunt> table2= LOAD 'table2_input_path' USING PigStorage(',') as (id:int, color:chararray);
grunt> joinlevel = JOIN table1 BY id, table2 BY id;
grunt> final = FOREACH joinlevel generate table1::id as id, table1::color as color, table2::value as value;
grunt> dump final;
于 2018-04-12T08:06:21.113 回答
0

如果你这样做DESCRIBE final;,你会看到架构看起来像这样:

final: {table1::id: chararray,table1::value: chararray,table2::id: chararray,table2::color: chararray}

要区分两个 ID 列,您可以使用table1::idtable2::id。因此,要删除其中一个重复的列,您可以执行以下操作:

A = FOREACH final GENERATE 
    table1::id AS id,
    table1::value AS value,
    table2::color AS color;

(我还重命名了这些字段以去掉table1::andtable2::前缀,因为它们不再需要了。)

我也可以这样做:

A = FOREACH final GENERATE 
    table1::id AS id,
    value AS value,
    color AS color;

这不会给我一个错误,因为value并且color是明确的名称。

于 2018-04-09T21:49:51.150 回答