5

我正在将 HDFS 快照复制到 S3 存储桶,出现以下错误:我正在执行的命令是:hadoop distcp /.snapshot/$SNAPSHOTNAME s3a://$ACCESSKEY:$SECRETKEY@$BUCKET/$SNAPSHOTNAME

    15/08/20 06:50:07 INFO mapreduce.Job:  map 38% reduce 0%
15/08/20 06:50:08 INFO mapreduce.Job:  map 39% reduce 0%
15/08/20 06:52:15 INFO mapreduce.Job:  map 41% reduce 0%
15/08/20 06:52:37 INFO mapreduce.Job: Task Id : attempt_1439998402428_0006_m_000004_0, Status : FAILED
Error: java.io.IOException: File copy failed: hdfs://mycluster/.snapshot/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/tmp/hive/vladmetodiev/6da8eee9-f482-4d07-96dc-87ff77a4efe4/hive_2015-07-23_17-12-21_989_8312247652079703611-121/-ext-10001/000035_0 --> s3n://AKIAJPPHQ6RXAPWCFMAA:RVZ9Q1+ezHkUVPEbasg4BUIGAS59C27bhJiNNlgD@ul-pdc-eu/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/tmp/hive/vladmetodiev/6da8eee9-f482-4d07-96dc-87ff77a4efe4/hive_2015-07-23_17-12-21_989_8312247652079703611-121/-ext-10001/000035_0
        at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:284)
        at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252)
        at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://mycluster/.snapshot/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/tmp/hive/vladmetodiev/6da8eee9-f482-4d07-96dc-87ff77a4efe4/hive_2015-07-23_17-12-21_989_8312247652079703611-121/-ext-10001/000035_0 to s3n://AKIAJPPHQ6RXAPWCFMAA:RVZ9Q1+ezHkUVPEbasg4BUIGAS59C27bhJiNNlgD@ul-pdc-eu/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/tmp/hive/vladmetodiev/6da8eee9-f482-4d07-96dc-87ff77a4efe4/hive_2015-07-23_17-12-21_989_8312247652079703611-121/-ext-10001/000035_0
        at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
        at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280)
        ... 10 more
Caused by: java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:345)
        at java.security.DigestOutputStream.write(DigestOutputStream.java:148)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
        at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.write(NativeS3FileSystem.java:293)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:255)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:184)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:124)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:100)
        at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
        ... 11 more

15/08/20 06:53:13 INFO mapreduce.Job: Task Id : attempt_1439998402428_0006_m_000007_0, Status : FAILED
Error: java.io.IOException: File copy failed: hdfs://mycluster/.snapshot/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/apps/hbase/data/data/default/XXHBCL01/985fbc7692868e3315ada852bcb59e1d/tr/77c160e32bfc4175a65d6a56feaeeb6c --> s3n://AKIAJPPHQ6RXAPWCFMAA:RVZ9Q1+ezHkUVPEbasg4BUIGAS59C27bhJiNNlgD@ul-pdc-eu/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/apps/hbase/data/data/default/XXHBCL01/985fbc7692868e3315ada852bcb59e1d/tr/77c160e32bfc4175a65d6a56feaeeb6c
        at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:284)
        at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252)
        at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://mycluster/.snapshot/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/apps/hbase/data/data/default/XXHBCL01/985fbc7692868e3315ada852bcb59e1d/tr/77c160e32bfc4175a65d6a56feaeeb6c to s3n://AKIAJPPHQ6RXAPWCFMAA:RVZ9Q1+ezHkUVPEbasg4BUIGAS59C27bhJiNNlgD@ul-pdc-eu/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/apps/hbase/data/data/default/XXHBCL01/985fbc7692868e3315ada852bcb59e1d/tr/77c160e32bfc4175a65d6a56feaeeb6c
        at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
        at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280)
        ... 10 more
Caused by: java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:345)
        at java.security.DigestOutputStream.write(DigestOutputStream.java:148)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
        at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.write(NativeS3FileSystem.java:293)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:255)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:184)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:124)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:100)
        at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
        ... 11 more

但是设备上有足够的空间大约 4 TB,请帮助。

4

1 回答 1

2

所以我遇到了同样的问题,这就是最终对我有用的:

hadoop distcp -D mapreduce.job.maxtaskfailures.per.tracker=1 ...

我尝试了一些事情(在同事的帮助下),但对我有用的主要事情是 - 将每个跟踪器的最大任务失败次数更改为 1。这主要是关键。基本上,单个节点的空间不足。因此,通过这样做,我强制该作业在节点已经失败后不要在节点上重试。

我尝试过但没有奏效的其他事情 1. 增加映射器的数量。(-m ) 2. 重试次数从 3 次增加到 12 次。 (-D yarn.app.mapreduce.client.max-retries=12)

于 2015-11-12T18:28:10.330 回答