我想从 S3 资源创建一个 Apache Spark DataFrame。我在 AWS 和 IBM S3 Clout Object Store 上试过,都失败了
org.apache.spark.util.TaskCompletionListenerException: Premature end of Content-Length delimited message body (expected: 2,250,236; received: 16,360)
我正在运行 pyspark
./pyspark --packages com.amazonaws:aws-java-sdk-pom:1.11.828,org.apache.hadoop:hadoop-aws:2.7.0
我正在为 IBM 设置 S3 配置
sc._jsc.hadoopConfiguration().set("fs.s3a.access.key", "xx")
sc._jsc.hadoopConfiguration().set("fs.s3a.secret.key", "xx")
sc._jsc.hadoopConfiguration().set("fs.s3a.endpoint", "s3.eu-de.cloud-object-storage.appdomain.cloud")
或 AWS 与
sc._jsc.hadoopConfiguration().set("fs.s3a.access.key", "xx")
sc._jsc.hadoopConfiguration().set("fs.s3a.secret.key", " xx ")
sc._jsc.hadoopConfiguration().set("fs.s3a.endpoint", "s3.us-west-2.amazonaws.com")
在这两种情况下,以下代码:df=spark.read.csv("s3a://drill-test/cases.csv")
它失败了,但有例外
org.apache.spark.util.TaskCompletionListenerException: Premature end of Content-Length delimited message body (expected: 2,250,236; received: 16,360)