4

我正在尝试使用 S3 存储桶作为 Elastic Map Reduce 作业流程的输入数据。S3 存储桶与 EMR 作业流不属于同一账户。我应该如何以及在何处指定 S3 存储桶凭证以访问相应的 S3 存储桶。我尝试了以下格式:

s3n://<Access Key>:<Secret Key>@<BUCKET>

但它给了我以下错误:

Exception in thread "main" java.lang.IllegalArgumentException: The bucket name parameter must be specified when listing objects in a bucket
at com.amazonaws.services.s3.AmazonS3Client.assertParameterNotNull(AmazonS3Client.java:2381)
at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:444)
at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:785)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.ensureBucketExists(Jets3tNativeFileSystemStore.java:80)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:83)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at org.apache.hadoop.fs.s3native.$Proxy1.initialize(Unknown Source)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:512)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1413)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:68)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1431)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:256)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:352)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:321)
at com.inmobi.appengage.emr.mapreduce.TestSession.main(TestSession.java:88)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:187)

我该如何指定相同的内容?

4

1 回答 1

3

您应该尝试将这些凭据添加到 core-site.xml 文件。您可以在节点中手动添加 s3 凭据,也可以在启动集群时使用 boostrap 操作。

您可以使用以下方式启动集群:

ruby elastic-mapreduce --create --alive --plain-output --master-instance-type m1.xlarge --slave-instance-type m1.xlarge --num-instances 11 --name "我的超级集群" - -bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop --args -c,fs.s3.awsAccessKeyId=<access-key>,-c,fs.s3.awsSecretAccessKey=<secret-key>

这应该覆盖 EMR 根据启动集群的帐户放置的默认值。

于 2013-08-23T21:00:12.150 回答