json - 如何在 pig 中使用 JsonLoader 加载数据

Question

假设我有一个这种形式的 json 文件：

    {"kind": "youtubeAnalytics#resultTable", "rows": [["2015-03-23", "a1jkKOrbYuQ", 141],["2015-03-24", "a1jkKOrbYuQ", 14]]}
    {"kind": "youtubeAnalytics#resultTable", "rows": [["2014-03-23", "a1jkKzubYuQ", 141],["2014-03-24", "a1jkKzubYuQ", 14]]}

以下是我编写的猪脚本，它只允许加载和转储文件：

    A = LOAD '/user/hdfs/youtube_data_views_' using JsonLoader('kind:chararray, rows:{field:(i1:chararray,i2:chararray,i3:int)}');
    DUMP A;

这是我得到的结果：

    (youtubeAnalytics#resultTable,)

实际上，我尝试了数十种元组和包的组合，以确保 A 被正确加载而不是部分加载。不幸的是，没有人工作。任何帮助将不胜感激

score 0 · Accepted Answer

事实证明，我需要做的就是使用大象鸟罐子......这是您需要注册的罐子列表：

    REGISTER 'hdfs://10.1.1.154:8020/user/hdfs/pig_jars/elephant-bird-core-4.4.jar';
    REGISTER 'hdfs://10.1.1.154:8020/user/hdfs/pig_jars/elephant-bird-hadoop-compat-4.4.jar';
    REGISTER 'hdfs://10.1.1.154:8020/user/hdfs/pig_jars/elephant-bird-pig-4.4.jar';
    REGISTER 'hdfs://10.1.1.154:8020/user/hdfs/pig_jars/google-collections-1.0-rc1.jar';
    REGISTER 'hdfs://10.1.1.154:8020/user/hdfs/pig_jars/json-simple-1.1.jar';

最后一个建议：避免使用内置的 JsonLoader 函数。您将花费时间来获得正确的架构......

json - 如何在 pig 中使用 JsonLoader 加载数据

1 回答 1

Related

Reference