我使用 Dataproc 云进行火花计算。问题是我的工作节点无权访问 textblob 包。我该如何解决?我正在使用 pyspark 内核在 jupyter notebook 中编码
代码错误:
PythonException:
An exception was thrown from the Python worker. Please see the stack trace below.
Traceback (most recent call last):
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 588, in main
func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 447, in read_udfs
udfs.append(read_single_udf(pickleSer, infile, eval_type, runner_conf, udf_index=i))
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 249, in read_single_udf
f, return_type = read_command(pickleSer, infile)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 69, in read_command
command = serializer._read_with_length(file)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 160, in _read_with_length
return self.loads(obj)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 430, in loads
return pickle.loads(obj, encoding=encoding)
ModuleNotFoundError: No module named 'textblob'
失败的示例代码:
data = [{"Category": 'Aaaa'},
{"Category": 'Bbbb'},
{"Category": 'Cccc'},
{"Category": 'Eeeee'}
]
df = spark.createDataFrame(data)
def sentPackage(text):
import textblob
return TextBlob(text).sentiment.polarity
sentPackageUDF = udf(sentPackage, StringType(), )
df = df.withColumn("polarity", sentPackageUDF(f.col("Category")))
df.show()