0

我正在编写一个加载大型文本文件的 Pig 脚本(我的第一个)。对于该文本文件中的每条记录,需要将一个字段的内容发送到 RESTful 服务进行处理。无需评估或过滤任何内容。捕获数据,将其发送出去,脚本不需要任何返回。

我假设这种功能需要 UDF,但我对 Pig 还很陌生,所以我不清楚我应该构建什么类型的函数。我最好的猜测是存储函数,因为数据最终存储在某个地方,但我觉得得出这个结论所涉及的猜测量比我想要的要高。

任何见解或指导将不胜感激。

4

2 回答 2

2

你看过DBStorage做类似的事情吗?

everything = LOAD 'categories.txt' USING PigStorage() AS (category:chararray);
...
STORE ordered INTO RestStorage('https://...');
于 2010-09-30T05:53:58.820 回答
0

Having never found even a hint of an answer to this, I decided to move in a different direction. I'm using Pig to load and parse the large file, but then streaming each record that I care about to PHP for additional processing that Pig doesn't seem to have the capability to handle cleanly.

It's still not complete (read: there's a great big, very unhappy bug in the mix), but I think the concept is solid--just need to work out the implementation details.

everything = LOAD 'categories.txt' USING PigStorage() AS (category:chararray);
-- apply filter
-- apply filter
-- ...
-- apply last filter
ordered  = ORDER filtered_categories BY category;

streamed = STREAM limited THROUGH `php -nF process_categories.php`;
DUMP streamed;
于 2010-09-29T11:31:18.217 回答