我已经在 databricks 中安装了 Spark mongodb 连接器,并尝试执行如下示例代码:
from pyspark.sql import SparkSession
my_spark = SparkSession \
.builder \
.appName("myApp") \
.getOrCreate()
df = my_spark.read.format("com.mongodb.spark.sql.DefaultSource") \
.option("uri", CONNECTION_STRING) \
.load()
其中 CONNECTION_STRING 采用以下格式:
mongodb://USERNAME:PASSWORD@testgp.documents.azure.com:10255/DATABASE_NAME.COLLECTION_NAME?ssl=true&replicaSet=globaldb
但面临以下错误:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 15) (10.25.238.198 executor 0): java.io.InvalidClassException: com.mongodb.spark.rdd.partitioner.MongoPartition; local class incompatible: stream classdesc serialVersionUID = -2855217470084313385, local class serialVersionUID = -3413909316915051241
有没有人遇到过这个错误和可能的解决方案?