嗨,我有这样的数据:
{“user_id”:“kim95”,“type”:“Book”,“title”:“现代数据库系统:对象模型、互操作性和超越。”,“year”:“1995”,“publisher”:“ ACM Press and Addison-Wesley", "authors": [{"name":"null"}], "source": "DBLP"}
{“user_id”:“marshallo79”,“type”:“Book”,“title”:“不等式:大写理论及其应用。”,“year”:“1979”,“publisher”:“Academic Press”, “作者”:[{“name”:“Albert W. Marshall”},{“name”:“Ingram Olkin”}],“来源”:“DBLP”}
{“user_id”:“knuth86a”,“type”:“Book”,“title”:“TeX:The Program”,“year”:“1986”,“publisher”:“Addison-Wesley”,“authors”: [{"name":"Donald E. Knuth"}], "source": "DBLP"} ...
我想获得出版商,标题,然后对组应用计数,但我收到错误'a column need be...'这个脚本:
books = load 'data/book-seded-workings-reduced.json'
using JsonLoader('user_id:chararray,type:chararray,title:chararray,year:chararray,publisher:chararray,authors:{(name:chararray)},source:chararray');
doc = group books by publisher;
res = foreach doc generate group,books.title,count(books.publisher);
DUMP res;
在第二个查询中,我希望有这样的结构 :(name,year),title
所以我尝试了这个:
books = load 'data/book-seded-workings-reduced.json'
using JsonLoader('user_id:chararray,type:chararray,title:chararray,year:chararray,publisher:chararray,authors:{(name:chararray)},source:chararray');
flat =group books by (generate FLATTEN((authors.name),year);
tab = foreach flat generate group, books.title;
DUMP tab;
但它也不起作用......
请问有什么想法吗?