0

我为我的嵌套数据尝试了这个脚本:

 `books = load 'data/book-seded-workings-reduced.json'
    using JsonLoader('user_id:chararray,type:chararray,title:chararray,year:chararray,publisher:chararray,authors:{(name:chararray)},source:chararray');`

group_auth = group books by title;

maped = foreach group_auth generate group, books.authors;

fil = foreach maped generate flatten(books); DUMP fil;

但我收到了这个错误:需要从关系中投影一列才能将其用作标量

任何想法?

4

1 回答 1

2
books = load 'input.data'
    using JsonLoader('user_id:chararray,
                      type:chararray,
                      title:chararray,
                      year:chararray,
                      publisher:chararray,
                      authors:{(name:chararray)},source:chararray');

flatten_authors = foreach books generate title, FLATTEN(authors.name);

dump flatten_authors;

输出:(来自在 Cloudera 中使用 serde 加载 JSON 文件引用的输入)

(Modern Database Systems: The Object Model, Interoperability, and Beyond.,null)
(Inequalities: Theory of Majorization and Its Application.,Albert W. Marshall)
(Inequalities: Theory of Majorization and Its Application.,Ingram Olkin)
于 2014-08-16T12:56:54.387 回答