尝试在 EMR 集群上使用 PySpark 将数据写入 Redshift 时出现错误。
df.write.format("jdbc") \
.option("url", "jdbc:redshift://clustername.yyyyy.us-east-1.redshift.amazonaws.com:5439/db") \
.option("driver", "com.amazon.redshift.jdbc42.Driver") \
.option("dbtable", "public.table") \
.option("user", user_redshift) \
.option("password", password_redshift) \
.mode("overwrite") \
.save()
我得到的错误是:
py4j.protocol.Py4JJavaError: An error occurred while calling o143.save.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, , executor 1):
java.sql.SQLException: [Amazon](500310) Invalid operation: The session is read-only;
at com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(Unknown Source)
at com.amazon.redshift.client.PGMessagingContext.handleErrorResponse(Unknown Source)
at com.amazon.redshift.client.PGMessagingContext.handleMessage(Unknown Source)
at com.amazon.jdbc.communications.InboundMessagesPipeline.getNextMessageOfClass(Unknown Source)
at com.amazon.redshift.client.PGMessagingContext.doMoveToNextClass(Unknown Source)
at com.amazon.redshift.client.PGMessagingContext.getParameterDescription(Unknown Source)
at com.amazon.redshift.client.PGClient.prepareStatement(Unknown Source)
at com.amazon.redshift.dataengine.PGQueryExecutor.<init>(Unknown Source)
at com.amazon.redshift.dataengine.PGDataEngine.prepare(Unknown Source)
at com.amazon.jdbc.common.SPreparedStatement.<init>(Unknown Source)
...
我很感激任何帮助。谢谢!