我想在我的 Pig 中使用 methods ,REPLACE但我无法以一种好的方式使用它。SUBSTRINGINDEXOF
- 第一种情况: - REPLACE在- REGEX_EXTRACT_ALL:- data_split = FOREACH data GENERATE FLATTEN(REGEX_EXTRACT_ALL(line, MY_REGULAR_EXPRESSION)) AS ( timestamp: chararray, url: chararray, REPLACE(url , '.*?://', '') AS clean_url: chararray);
我想使用 REPLACE 删除http://URL 中的前导。在这种情况下,我得到:
Error during parsing. Encountered " "(" "( ""
- 第二种情况:重用输出: - ws = FOREACH data_split { clean_url = REPLACE(url , '.*?://', ''); url_index = INDEXOF(clean_url, '/'); web_server = SUBSTRING(clean_url, 0, url_index); GENERATE web_server, timestamp, ip ;
这种情况都不起作用,当我尝试clean_url从以前的调用中重用时REPLACE,我得到了
Attempt to give operator of type 
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc       
    multiple outputs.  This operator does not support multiple outputs.
谢谢