0

我收到以下关于找不到文件的错误。嗯...文件存在。我是 distcp 的新手。我正在使用 cloudera 仅供参考。

 https://s3.amazonaws.com/test-development/test/201305031003_0_ubuntu.gz


ubuntu@ubuntu:~$ hadoop distcp -i 201305031003_0_ubuntu.gz s3://id:key@test-development/test/201305031003_0_ubuntu.gz
13/05/04 14:54:29 INFO tools.DistCp: srcPaths=[201305031003_0_ubuntu.gz]
13/05/04 14:54:29 INFO tools.DistCp: destPath=s3://id:key@test-development/test/201305031003_0_ubuntu.gz
With failures, global counters are inaccurate; consider running with -i
Copy failed: org.apache.hadoop.mapred.InvalidInputException: Input source 201305031003_0_ubuntu.gz does not exist.
    at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:641)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
4

1 回答 1

2

第一个参数是源,因此它应该是 S3 的路径,并且路径应该是 s3n:// 而不是 s3://(原生 s3),除非您使用 s3://(块文件)将数据写入 S3系统)

于 2013-05-05T09:40:51.427 回答