hadoop - 如何使用 Tez 引擎修复 Hive 中的间歇性文件未找到错误

Question

当我使用 Tez 引擎在 Hive 中运行查询时，出现间歇性FileNotFoundException 错误。

ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1508808910527_45616_1_00, diagnostics=[Task failed, taskId=task_1508808910527_45616_1_00_000066, diagnostics=[TaskAttempt 0 failed, info=[Container container_e09_1508808910527_45616_01_000033 finished with diagnostics set to [Container failed, exitCode=-1000. File does not exist: hdfs://server02.corp.company.com:8020/tmp/hive/username/_tez_session_dir/b65ddde9-110e-47fc-ae1c-33a1f754f839/nzcodec.jar
java.io.FileNotFoundException: File does not exist: hdfs://server02.corp.company.com:8020/tmp/hive/username/_tez_session_dir/b65ddde9-110e-47fc-ae1c-33a1f754f839/nzcodec.jar
        at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
        at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
        at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
        at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359)
        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

该查询从临时表中选择数据，对其进行重新分区并将其写入报告表。

INSERT OVERWRITE TABLE ${reporting_table} PARTITION (day, app_name) select <all the fields> from ${staging_table} where day = '${day}'

暂存数据存储在 Avro 文件中，大小为 350GB

hadoop fs -du -h -s /staged-data/2017-11-02
350.7 G /staged-data/2017-11-02

我多次对同一组数据运行相同的查询，但故障是间歇性的。

我的纱线设置如下所示：

yarn.nodemanager.resource.memory-mb     83968
yarn.scheduler.minimum-allocation-mb    2048

我对查询的 Tez 设置如下所示：

SET hive.execution.engine=tez;
SET tez.am.resource.memory.mb=2048;
SET hive.tez.container.size=2048;

SET hive.merge.tezfiles=true;
SET hive.merge.smallfiles.avgsize=128000000;
SET hive.merge.size.per.task=128000000;

我已经完成了https://community.hortonworks.com/articles/14309/demystify-tez-tuning-step-by-step.html上的建议，但我仍然看到这个问题。调整容器大小似乎没有帮助。

我可以修改另一组设置来防止这种情况吗？

hadoop - 如何使用 Tez 引擎修复 Hive 中的间歇性文件未找到错误

0 回答 0

Related

Reference