当我在 python 脚本中运行以下代码并直接使用 python 运行它时,我收到以下错误。当我启动 pyspark 会话然后导入考拉时,创建数据框并调用 head() 它运行良好并给了我预期的输出。
是否需要设置 SparkSession 以使考拉工作的特定方式?
from pyspark.sql import SparkSession
import pandas as pd
import databricks.koalas as ks
spark = SparkSession.builder \
.master("local[*]") \
.appName("Pycedro Spark Application") \
.getOrCreate()
kdf = ks.DataFrame({"a" : [4 ,5, 6],
"b" : [7, 8, 9],
"c" : [10, 11, 12]})
print(kdf.head())
在 python 脚本中运行时出错:
File "/usr/local/Cellar/apache-spark/3.1.1/libexec/python/lib/pyspark.zip/pyspark/worker.py", line 586, in main
func, profiler, deserializer, serializer = read_command(pickleSer, infile)
File "/usr/local/Cellar/apache-spark/3.1.1/libexec/python/lib/pyspark.zip/pyspark/worker.py", line 69, in read_command
command = serializer._read_with_length(file)
File "/usr/local/Cellar/apache-spark/3.1.1/libexec/python/lib/pyspark.zip/pyspark/serializers.py", line 160, in _read_with_length
return self.loads(obj)
File "/usr/local/Cellar/apache-spark/3.1.1/libexec/python/lib/pyspark.zip/pyspark/serializers.py", line 430, in loads
return pickle.loads(obj, encoding=encoding)
AttributeError: Can't get attribute '_fill_function' on <module 'pyspark.cloudpickle' from '/usr/local/Cellar/apache-spark/3.1.1/libexec/python/lib/pyspark.zip/pyspark/cloudpickle/__init__.py'>
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:517)
[...]
版本:考拉:1.7.0 pyspark:版本:3.0.2