1

使用 Hadoop 的 PIG-Latin 从搜索引擎日志文件中查找唯一搜索字符串的出现次数。(单击此处查看示例日志文件)请帮帮我。提前致谢。

猪脚本

excitelog = load '/user/hadoop/input/excite-small.log' using PigStorage() AS
(encryptcode:chararray, numericid:int, searchstring:chararray);                                        

GroupBySearchString = GROUP excitelog by searchstring;    

searchStrFrq = foreach GroupBySearchString Generate group as searchstring,count(searchstring);

遇到错误

 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve count using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]
4

2 回答 2

4

你需要做:

searchStrFrq = foreach GroupBySearchString Generate group as searchstring,
                                                COUNT(excitelog) as kount;

这是因为在 pig 中分组的工作方式GroupBySearchString是一袋{group, excitelog}excitelog它本身就是一袋与该组匹配的所有元组。COUNT是一个UDF,将一个包作为输入,并返回包中元组的数量。所以,COUNT(excitelog)然后会给你匹配的元组数group

于 2013-09-15T09:38:14.117 回答
0

函数名称 PigStorage 和 COUNT 区分大小写。所以需要保持 COUNT 函数如下:

wordcount = FOREACH grouped GENERATE group , COUNT(words);
于 2017-02-26T11:29:27.600 回答