我无法理解调用接受元组作为输入的 Java UDF 的方式。
gsmCell = LOAD '$gsmCell' using PigStorage('\t') as
(branchId,
cellId: int,
lac: int,
lon: double,
lat: double
);
gsmCellFiltered = FILTER gsmCell BY cellId is not null and
lac is not null and
lon is not null and
lat is not null;
gsmCellFixed = FOREACH gsmCellFiltered GENERATE FLATTEN (pig.parser.GSMCellParser(* ) ) as
(cellId: int,
lac: int,
lon: double,
lat: double,
);
当我使用 () 为 GSMCellParser 包装输入时,我进入了 UDF:Tuple(Tuple)。Pig 确实将所有字段包装成元组并将其放入另一个元组中。
当我尝试传递字段列表时,请使用 * 或 $0 .. 我确实得到了异常:
sed by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1045:
<line 28, column 57> Could not infer the matching function for pig.parser.GSMCellParser as multiple or none of them fit. Please use an explicit cast.
at org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.visit(TypeCheckingExpVisitor.java:761)
at org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:88)
at org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visitExpressionPlan(TypeCheckingRelVisitor.java:191)
at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:157)
at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:246)
我做错了什么?我的目标是用元组喂养我的 UDF。元组应包含字段列表。(即元组的大小应为 4:cellid、lac、lon.lat)
UPD: 我试过 GROUP ALL:
--filter non valid records
gsmCellFiltered = FILTER gsmCell BY cellId is not null and
lac is not null and
lon is not null and
lat is not null and
azimuth is not null and
angWidth is not null;
gsmCellFilteredGrouped = GROUP gsmCellFiltered ALL;
--fix records
gsmCellFixed = FOREACH gsmCellFilteredGrouped GENERATE FLATTEN (pig.parser.GSMCellParser($1)) as
(cellId: int,
lac: int,
lon: double,
lat: double,
azimuth: double,
ppw,
midDist: double,
maxDist,
cellType: chararray,
angWidth: double,
gen: chararray,
startAngle: double
);
Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1045:
<line 27, column 64> Could not infer the matching function for pig.parser.GSMCellParser as multiple or none of them fit. Please use an explicit cast.
这个 UDF 的输入模式是:Tuple 我不明白。元组是一组有序的字段。LOAD 函数返回一个元组给我。我想将整个元组传递给我的 UDF。