1

I have two data sets

Definition of schema A - Name, city, state
A= {
  Ram,    Sunnyvale,  CA
  Soju,   Austin,     TX
  Rathos, Bangalore,  Karnataka
  Mike,   Portland,   OR
}

B = {
    Ram,  Refund
    Soju, Refund
}

I would like to join these two tables based on state and have the output as follows

Schema Definition - Name,City,State,RefundIssued (Yes/No)
  Ram,Sunnyvale,CA,yes
  Soju,Austin,TX,yes
  Rathos,Bangalore,Karnataka,no
  Mike,Portland, OR,no

I am not sure on how to specify that I need extra column and which goes on the logic

A = load 'data1.txt' using PigStorage(',') as (name: chararray,city: chararray,state: chararray);
B= load 'data2.txt' using PigStorage(',') as (name: chararray,type: chararray);
C = join A by name LEFT OUTER,B by name;  
D = foreach C generate A::name as firstname,B::type as charge_type;
--how to add new column which goes on refund issued as yes /no
store D into '1outdata.txt';
4

1 回答 1

3
A = load 'data1.txt' using PigStorage(',') as (name: chararray,city: chararray,state: chararray);
B= load 'data2.txt' using PigStorage(',') as (name: chararray,type: chararray);
C = join A by name LEFT OUTER,B by name;  
D = foreach C generate A::name as name , A::city as city, A::state as state, (B::type == 'Refund' ? 'True' : 'False') as RefundIssued

请注意,由于 bincond 的工作方式,RefundIssues 可以是“true”、“false”或 null。如果您希望将 null(左连接找不到匹配项或字段值为 null)转换为 false,请使用:

E = foreach D generate name , city, state, (RefundIssued IS NULL ? 'False' : RefundIssued) as RefundIssued
于 2013-08-13T08:10:00.643 回答