我最近在工作中遇到了这个问题,它是关于猪扁平化的。我用一个简单的例子来表达
两个文件
===file1===
1_a
2_b
4_d
===file2(制表符分隔)===
1 a
2 b
3 c
猪脚本1:
a = load 'file1' as (str:chararray);
b = load 'file2' as (num:int, ch:chararray);
a1 = foreach a generate flatten(STRSPLIT(str,'_',2)) as (num:int, ch:chararray);
c = join a1 by num, b by num;
dump c; -- exception java.lang.String cannot be cast to java.lang.Integer
猪脚本2:
a = load 'file1' as (str:chararray);
b = load 'file2' as (num:int, ch:chararray);
a1 = foreach a generate flatten(STRSPLIT(str,'_',2)) as (num:int, ch:chararray);
a2 = foreach a1 generate (int)num as num, ch as ch;
c = join a2 by num, b by num;
dump c; -- exception java.lang.String cannot be cast to java.lang.Integer
猪脚本3:
a = load 'file1' as (str:chararray);
b = load 'file2' as (num:int, ch:chararray);
a1 = foreach a generate flatten(STRSPLIT(str,'_',2));
a2 = foreach a1 generate (int)$0 as num, $1 as ch;
c = join a2 by num, b by num;
dump c; -- right
我不知道为什么脚本 1,2 是错误的而脚本 3 是正确的,我也想知道是否有更简洁的表达式来获得关系 c,谢谢。