0

嗨,我有这样的数据:

{“user_id”:“kim95”,“type”:“Book”,“title”:“现代数据库系统:对象模型、互操作性和超越。”,“year”:“1995”,“publisher”:“ ACM Press and Addison-Wesley", "authors": [{"name":"null"}], "source": "DBLP"}

{“user_id”:“marshallo79”,“type”:“Book”,“title”:“不等式:大写理论及其应用。”,“year”:“1979”,“publisher”:“Academic Press”, “作者”:[{“name”:“Albert W. Marshall”},{“name”:“Ingram Olkin”}],“来源”:“DBLP”}

{“user_id”:“knuth86a”,“type”:“Book”,“title”:“TeX:The Program”,“year”:“1986”,“publisher”:“Addison-Wesley”,“authors”: [{"name":"Donald E. Knuth"}], "source": "DBLP"} ...

我想获得出版商,标题,然后对组应用计数,但我收到错误'a column need be...'这个脚本:

books = load 'data/book-seded-workings-reduced.json'
    using JsonLoader('user_id:chararray,type:chararray,title:chararray,year:chararray,publisher:chararray,authors:{(name:chararray)},source:chararray');

doc = group books by publisher;
res = foreach doc generate group,books.title,count(books.publisher);
DUMP res;    

在第二个查询中,我希望有这样的结构 :(name,year),title

所以我尝试了这个:

books = load 'data/book-seded-workings-reduced.json'
    using JsonLoader('user_id:chararray,type:chararray,title:chararray,year:chararray,publisher:chararray,authors:{(name:chararray)},source:chararray');


flat =group books by (generate FLATTEN((authors.name),year);
tab = foreach flat generate group, books.title;
DUMP tab;

但它也不起作用......

请问有什么想法吗?

4

2 回答 2

1

您在尝试第一个查询时遇到什么错误?内置函数 COUNT 必须全部大写,不能调用 COUNT(group),group 是 Pig 生成的内部标识符。

我在运行您的第一个查询时得到以下结果 -

(Academic Press,{(Inequalities: Theory of Majorization and Its Application.)},1) (Addison-Wesley,{(TeX: The Program)},1) (ACM Press and Addison-Wesley,{(Modern Database Systems:对象模型、互操作性及其他。)},1)

(name,year),title的预期格式也可以这样实现——

flat = foreach books generate FLATTEN(authors.name) as authorName, year, title;
tab = group flat by (authorName, year);
finaltab = foreach tab generate group, flat.title;
于 2014-08-26T12:53:35.173 回答
1

我能看到的第一个代码中唯一的问题是“COUNT”而不是 count(大写)

如果你使用没有大写计数,那么你会得到一个错误

无法使用导入解析计数:

于 2014-08-26T14:58:02.750 回答