我正在尝试使用以下 cmd 运行 DeltaStreamer 作业以将数据推送到 S3 存储桶:
spark-submit \
--packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.3 \
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
--conf spark.hadoop.fs.s3a.endpoint=s3.ap-south-1.amazonaws.com \
--conf spark.hadoop.fs.s3a.access.key='AA..AA' \
--conf spark.hadoop.fs.s3a.secret.key='WQO..IOEI' \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE \
--table-type COPY_ON_WRITE \
--source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
--source-ordering-field cloud.account.id \
--target-base-path s3a://test \
--target-table test1_cow \
--props /var/demo/config/kafka-source.properties \
--hoodie-conf hoodie.datasource.write.recordkey.field=cloud.account.id \
--hoodie-conf hoodie.datasource.write.partitionpath.field=cloud.account.id \
--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider
这将返回以下错误:
Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: 9..1, AWS Error Code: null, AWS Error Message: Bad Request, S3 Extended Request ID: G..g=
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
...
我想我正在使用正确的 S3 端点。我需要创建 S3 接入点吗?我正在关注https://hudi.apache.org/docs/docker_demo.html ( https://github.com/apache/hudi/tree/master/docker ) 中提到的版本。