我想在我的 Pig 中使用 methods ,REPLACE
但我无法以一种好的方式使用它。SUBSTRING
INDEXOF
第一种情况:
REPLACE
在REGEX_EXTRACT_ALL
:data_split = FOREACH data GENERATE FLATTEN(REGEX_EXTRACT_ALL(line, MY_REGULAR_EXPRESSION)) AS ( timestamp: chararray, url: chararray, REPLACE(url , '.*?://', '') AS clean_url: chararray);
我想使用 REPLACE 删除http://
URL 中的前导。在这种情况下,我得到:
Error during parsing. Encountered " "(" "( ""
第二种情况:重用输出:
ws = FOREACH data_split { clean_url = REPLACE(url , '.*?://', ''); url_index = INDEXOF(clean_url, '/'); web_server = SUBSTRING(clean_url, 0, url_index); GENERATE web_server, timestamp, ip ;
这种情况都不起作用,当我尝试clean_url
从以前的调用中重用时REPLACE
,我得到了
Attempt to give operator of type
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc
multiple outputs. This operator does not support multiple outputs.
谢谢