0

我长期以来一直面临这个问题。我试图解决这个问题,但我做不到。我需要一些专家的建议来解决这个问题。

我正在尝试加载示例推文 json 文件。

示例.json;-

{"filter_level":"low","retweeted":false,"in_reply_to_screen_name":"FilmFan","truncated":false,"lang":"en","in_reply_to_status_id_str":null,"id":689085590822891521,"in_reply_to_user_id_str":"6048122","timestamp_ms":"1453125782100","in_reply_to_status_id":null,"created_at":"Mon Jan 18 14:03:02 +0000 2016","favorite_count":0,"place":null,"coordinates":null,"text":"@filmfan hey its time for you guys follow @acadgild To #AchieveMore and participate in contest Win Rs.500 worth vouchers","contributors":null,"geo":null,"entities":{"symbols":[],"urls":[],"hashtags":[{"text":"AchieveMore","indices":[56,68]}],"user_mentions":[{"id":6048122,"name":"Tanya","indices":[0,8],"screen_name":"FilmFan","id_str":"6048122"},{"id":2649945906,"name":"ACADGILD","indices":[42,51],"screen_name":"acadgild","id_str":"2649945906"}]},"is_quote_status":false,"source":"<a href=\"https://about.twitter.com/products/tweetdeck\" rel=\"nofollow\">TweetDeck<\/a>","favorited":false,"in_reply_to_user_id":6048122,"retweet_count":0,"id_str":"689085590822891521","user":{"location":"India ","default_profile":false,"profile_background_tile":false,"statuses_count":86548,"lang":"en","profile_link_color":"94D487","profile_banner_url":"https://pbs.twimg.com/profile_banners/197865769/1436198000","id":197865769,"following":null,"protected":false,"favourites_count":1002,"profile_text_color":"000000","verified":false,"description":"Proud Indian, Digital Marketing Consultant,Traveler, Foodie, Adventurer, Data Architect, Movie Lover, Namo Fan","contributors_enabled":false,"profile_sidebar_border_color":"000000","name":"Bahubali","profile_background_color":"000000","created_at":"Sat Oct 02 17:41:02 +0000 2010","default_profile_image":false,"followers_count":4467,"profile_image_url_https":"https://pbs.twimg.com/profile_images/664486535040000000/GOjDUiuK_normal.jpg","geo_enabled":true,"profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","follow_request_sent":null,"url":null,"utc_offset":19800,"time_zone":"Chennai","notifications":null,"profile_use_background_image":false,"friends_count":810,"profile_sidebar_fill_color":"000000","screen_name":"Ashok_Uppuluri","id_str":"197865769","profile_image_url":"http://pbs.twimg.com/profile_images/664486535040000000/GOjDUiuK_normal.jpg","listed_count":50,"is_translator":false}}

我尝试使用ELEPHANT BIRD加载此 json 文件

脚本:-

REGISTER json-simple-1.1.1.jar 
REGISTER elephant-bird-2.2.3.jar 
REGISTER guava-11.0.2.jar 
REGISTER avro-1.7.7.jar
REGISTER piggybank-0.12.0.jar


twitter = LOAD 'sample.json' USING com.twitter.elephantbird.pig.load.JsonLoader();

B = foreach twitter generate (chararray)$0#'created_at' as created_at,(chararray)$0#'id' as id,(chararray)$0#'id_str' as id_str,(chararray)$0#'text' as text,(chararray)$0#'source' as source,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'entities') as entities,(boolean)$0#'favorited' as favorited;

describe B;

输出:-

B: {created_at: chararray,id: chararray,id_str: chararray,text: chararray,source: chararray,entitis: map[chararray],favorited: boolean}

但是当我尝试DUMP B时,发生了以下错误

错误 org.apache.pig.tools.grunt.Grunt - 错误 1066:无法打开别名 B 的迭代器

我在这里提供完整的日志。

199 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - 设置单一存储作业 2016-09-11 14:07:57,199 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] 为 false,不会生成代码。2016-09-11 14:07:57,199 [main] INFO org.apache.pig.data.SchemaTupleFrontend - 将生成的代码移动到分布式缓存的启动过程 2016-09-11 14:07:57,199 [main] INFO org.apache .pig.data.SchemaTupleFrontend - 本地模式不支持或不需要分布式缓存。使用代码临时目录设置键 [pig.schematuple.local.dir]:/tmp/1473583077199-0 2016-09-11 14:07:57,206 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer。 MapReduceLauncher - 1 个 map-reduce 作业等待提交。2016-09-11 14:07:57,207 [JobControl] 信息 org.apache。警告 org.apache.hadoop.mapreduce.JobResourceUploader - 没有设置作业 jar 文件。可能找不到用户类。请参阅 Job 或 Job#setJar(String)。2016-09-11 14:07:57,211 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - 处理的总输入路径:1 2016-09-11 14:07:57,211 [JobControl] INFO org .apache.pig.backend.hadoop.executionengine.util.MapRedUtil - 处理的总输入路径(组合):1 2016-09-11 14:07:57,212 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number拆分次数:1 2016-09-11 14:07:57,216 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - 提交作业令牌:job_local360376249_0009 2016-09-11 14:07:57,267 [JobControl] INFO org. apache.hadoop.mapreduce.Job - 跟踪作业的 url: http://localhost:8080/288 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.data.SchemaTupleBackend - 键 [pig.schematuple] 未设置...不会生成代码。2016-09-11 14:07:57,290 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - 每个作业阶段处理的别名(AliasName[line,offset] ): M: twitter[20,10],B[21,4] C: R: 2016-09-11 14:07:57,291 [Thread-214] INFO org.apache.hadoop.mapred.LocalJobRunner - 映射任务执行器完全的。2016-09-11 14:07:57,296 [线程 214] PigMapOnly$Map - 每个作业阶段处理的别名(AliasName[line,offset]):M: twitter[20,10],B[21,4] C: R: 2016-09-11 14:07:57,291 [Thread -214] INFO org.apache.hadoop.mapred.LocalJobRunner - 映射任务执行器完成。2016-09-11 14:07:57,296 [线程 214] PigMapOnly$Map - 每个作业阶段处理的别名(AliasName[line,offset]):M: twitter[20,10],B[21,4] C: R: 2016-09-11 14:07:57,291 [Thread -214] INFO org.apache.hadoop.mapred.LocalJobRunner - 映射任务执行器完成。2016-09-11 14:07:57,296 [线程 214]警告 org.apache.hadoop.mapred.LocalJobRunner - job_local360376249_0009 java.lang.Exception: java.lang.IncompatibleClassChangeError: 找到接口 org.apache.hadoop.mapreduce.Counter,但在 org.apache.hadoop.mapred.LocalJobRunner 上应该有类$Job.runTasks(LocalJobRunner.java:462) 在 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) 引起:java.lang.IncompatibleClassChangeError:找到接口 org.apache.hadoop.mapreduce.Counter,但在 com.twitter 上应该有类.elephantbird.pig.util.PigCounterHelper.incrCounter(PigCounterHelper.java:55) 在 com.twitter.elephantbird.pig.load.LzoBaseLoadFunc.incrCounter(LzoBaseLoadFunc.java:70) 在 com.twitter.elephantbird.pig.load.JsonLoader .getNext(JsonLoader.java:130) 在 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204) 在 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask .java:556) 在 org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context 的 org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)。nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache .hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:511) 在 java.util.concurrent.FutureTask.run(FutureTask.java:266) 在 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 在 java.util.concurrent.ThreadPoolExecutor$ Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2016-09-11 14:07:57,467 [main] INFO org.apache.pig.backend.hadoop.executionengine .mapReduceLayer.MapReduceLauncher - HadoopJobId:job_local360376249_0009 2016-09-11 14:07:57,467 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 处理别名 B,twitter 2016-09-11 14:07:57,467 [main] INFO org.apache.pig.backend.hadoop。 executionengine.mapReduceLayer.MapReduceLauncher - 详细位置:M: twitter[20,10],B[21,4] C: R: 2016-09-11 14:07:57,468 [main] INFO org.apache.pig.backend。 hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% 完成 2016-09-11 14:07:57,468 [主要]mapReduceLayer.MapReduceLauncher - 0% 完成 2016-09-11 14:07:57,468 [main]mapReduceLayer.MapReduceLauncher - 0% 完成 2016-09-11 14:07:57,468 [main]警告 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 哎呀!有些工作失败了!如果您希望 Pig 在失败时立即停止,请指定 -stop_on_failure。2016-09-11 14:07:57,468 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 作业 job_local360376249_0009 失败了!停止运行所有相关作业 2016-09-11 14:07:57,468 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% 完成 2016-09-11 14:07:57,469 [main ] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - 无法使用 processName=JobTracker,sessionId= 初始化 JVM 指标 - 已经初始化 2016-09-11 14:07:57,469 [main] INFO org.apache.hadoop.metrics。 jvm.JvmMetrics - 无法使用 processName=JobTracker、sessionId= 初始化 JVM 指标 - 已初始化 2016-09-11 14:07:57,469 [main] 错误 org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 个 map reduce 作业(s) 失败了!2016-09-11 14:07:57,470 [主要] 信息 org.apache.pig。tools.pigstats.mapreduce.SimplePigStats - 脚本统计:HadoopVersionPigVersionUserIdStartedAtFinishedAtFeatures 2.7.1.2.3.4.7-40.15.0.2.3.4.7-4root2016-09-11 14:07:572016-09-11 14:07:57UNKNOWN 失败!作业失败:JobIdAliasFeatureMessageOutputs job_local360376249_0009B,twitterMAP_ONLYMessage:作业失败!文件:/tmp/temp252944192/tmp-470484503,输入:无法从“file:///root/PIG/PIG/sample.json”读取数据输出( s):无法在“file:/tmp/temp252944192/tmp-470484503”中产生结果计数器:写入的总记录:0 写入的总字节数:0 Spillable Memory Manager 溢出计数:0 主动溢出的包总数:0 主动溢出的总记录: 0 工作 DAG:job_local360376249_0009 HadoopVersionPigVersionUserIdStartedAtFinishedAtFeatures 2.7.1.2.3.4.7-40.15.0.2.3.4.7-4root2016-09-11 14:07:572016-09-11 14:07:57UNKNOWN 失败!作业失败:JobIdAliasFeatureMessageOutputs job_local360376249_0009B,twitterMAP_ONLYMessage:作业失败!文件:/tmp/temp252944192/tmp-470484503,输入:无法从“file:///root/PIG/PIG/sample.json”读取数据输出( s):无法在“file:/tmp/temp252944192/tmp-470484503”中产生结果计数器:写入的总记录:0 写入的总字节数:0 Spillable Memory Manager 溢出计数:0 主动溢出的包总数:0 主动溢出的总记录: 0 工作 DAG:job_local360376249_0009 HadoopVersionPigVersionUserIdStartedAtFinishedAtFeatures 2.7.1.2.3.4.7-40.15.0.2.3.4.7-4root2016-09-11 14:07:572016-09-11 14:07:57UNKNOWN 失败!作业失败:JobIdAliasFeatureMessageOutputs job_local360376249_0009B,twitterMAP_ONLYMessage:作业失败!文件:/tmp/temp252944192/tmp-470484503,输入:无法从“file:///root/PIG/PIG/sample.json”读取数据输出( s):无法在“file:/tmp/temp252944192/tmp-470484503”中产生结果计数器:写入的总记录:0 写入的总字节数:0 Spillable Memory Manager 溢出计数:0 主动溢出的包总数:0 主动溢出的总记录: 0 工作 DAG:job_local360376249_0009

并请说明如何使用 jar 文件,

以及要使用的版本是什么。我很困惑要使用哪个版本。

有人说用象鸟,有人说用 AVRO。但我和他们都没有工作。

请帮忙。

莫汉

4

1 回答 1

0

我自己弄的。这是jar版本问题。 脚本:-

REGISTER elephant-bird-core-4.1.jar 
REGISTER elephant-bird-pig-4.1.jar 
REGISTER elephant-bird-hadoop-compat-4.1.jar

它工作得很好。

于 2016-09-12T06:49:35.267 回答