hadoop - Hadoop：猪错误

Question

我是 Hadoop 的绝对初学者，我只做一些简单的测试，但是，我没有发现错误消息非常有用。

我已经在 CentOS 6.4 VM 上以单节点模式设置了我的 Hadoop 环境，并提供了 4Gb 的 RAM。

我正在尝试在 500Mb CSV 文件上运行一个简单的 Pig 脚本。我有两个 500Mb 的文件，在第一个上，脚本是成功的。在第二个上，它的大小大致相同，但数据不同（更多行），当执行达到大约 60% 时，我得到一个错误。

这是我使用的（非常简单的）Pig 脚本：

records = LOAD 'trans2013.csv' USING PigStorage(',') AS
(podracun_v_breme,datum_transakcije,znesek_transakcije,oznaka_valute_transakcije,racun_v_dobro,naziv_prejemnika,maticna_stevilka,davcna_stevilka,sifra_pu,zr_sns_oe,namen);
transaction_recs = GROUP records ALL;
tot_trans = FOREACH transaction_recs GENERATE
SUM(records.znesek_transakcije);
STORE tot_trans INTO '/user/root/totaltransactions';

这是我在终端中遇到的错误：

2014-04-06 10:28:29,147 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 完成 64% 2014-04-06 10:28:30,240 [main] WARN org.apache .pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 哎呀！有些工作失败了！如果您希望 Pig 在失败时立即停止，请指定 -stop_on_failure。2014-04-06 10:28:30,241 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 作业 job_1396637732046_0008 失败了！停止运行所有相关作业 2014-04-06 10:28:30,241 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% 完成 2014-04-06 10:28:30,460 [main] ] 错误 org.apache.pig.tools.pigstats.SimplePigStats - 错误 2997：无法从支持的错误重新创建异常：AttemptID：

2014-04-06 10:28:30,461 [main] 错误 org.apache.pig.tools.pigstats.PigStatsUtil - 1 个地图减少作业失败！2014-04-06 10:28:30,463 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - 脚本统计：

HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.0.6-alpha 0.11.1 root 2014-04-06 10:25:49 2014-04-06 10:28:30 GROUP_BY

失败的！

失败的作业：JobId 别名功能消息输出 job_1396637732046_0008 记录，tot_trans，transaction_recs GROUP_BY，COMBINER 消息：作业失败！/user/root/totaltransactions,

输入：无法从“hdfs://localhost:8020/user/root/trans2013.csv”读取数据

输出：无法在“/user/root/totaltransactions”中产生结果

计数器：写入的总记录数：0 写入的总字节数：0 Spillable Memory Manager 溢出计数：0 主动溢出的总包数：0 主动溢出的总记录数：0

工作 DAG：job_1396637732046_0008

2014-04-06 10:28:30,463 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 失败！2014-04-06 10:28:30,491 [main] 错误 org.apache.pig.tools.grunt.GruntParser - 错误 2997：无法从支持的错误中重新创建异常：尝试 ID：尝试_1396637732046_0008_m_000001_0 信息：容器被 ApplicationMaster 杀死。

日志文件中的详细信息：/root/pig_1396797945352.log

这是日志中的错误：

后端错误消息 --------- AttemptID:attempt_1396637732046_0008_m_000001_0 Info:Container 被 ApplicationMaster 杀死。

Pig Stack Trace --------------- ERROR 2997: Unable to recreate exception from backed error: AttemptID:attempt_1396637732046_0008_m_000001_0 Info:Container 被 ApplicationMaster 杀死。

org.apache.pig.backend.executionengine.ExecException：错误 2997：无法从支持的错误中重新创建异常：尝试 ID：尝试_1396637732046_0008_m_000001_0 信息：容器被 ApplicationMaster 杀死。

在 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:217) 在 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:149)在 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:400) 在 org.apache.pig.PigServer.launchPlan(PigServer.java:1264) 在 org.apache.pig.PigServer .executeCompiledLogicalPlan(PigServer.java:1249) 在 org.apache.pig.PigServer.execute(PigServer.java:1239) 在 org.apache.pig.PigServer.executeBatch(PigServer.java:333) 在 org.apache.pig。 tools.grunt.GruntParser.executeBatch(GruntParser.java:137) 在 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198) 在 org.apache.pig.tools.grunt.GruntParser。parseStopOnError(GruntParser.java:170) 在 org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) 在 org.apache.pig.Main.run(Main.java:604) 在 org.apache .pig.Main.main(Main.java:157) 在 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 在 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 在 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl .java:43) 在 java.lang.reflect.Method.invoke(Method.java:606) 在 org.apache.hadoop.util.RunJar.main(RunJar.java:212)invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar .main(RunJar.java:212)invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar .main(RunJar.java:212)

score 4 · Accepted Answer

So ... I tried running Pig script with mapreduce option:

pig -x mapreduce script.pig

It still failed, but it at least produced a meaningful error. It seems I had to remove the header from the CSV file, because Pig used the header row as data. It seems this only happens when working with floating point numbers - if using the same script with integers, the header row would simply be ignored.

So that was it. First removed the header from the file, than running the script against it - it worked.

hadoop - Hadoop：猪错误

1 回答 1

Related

Reference