我已databricks-connect
按照此处的说明在 Windows 10 上安装:https ://docs.databricks.com/dev-tools/databricks-connect.html
运行databricks-connect configure
并输入所有值后,我正在运行databricks-connect test
。这是我得到的输出,它挂起:
* PySpark is installed at c:\users\user\.conda\envs\myenv\lib\site-packages\pyspark
* Checking SPARK_HOME
* Checking java version
java version "1.8.0_251"
Java(TM) SE Runtime Environment (build 1.8.0_251-b08)
Java HotSpot(TM) 64-Bit Server VM (build 25.251-b08, mixed mode)
* Skipping scala command test on Windows
* Testing python command
The system cannot find the path specified.
再深入一点,似乎底层pyspark
包无法初始化。它在这一行失败:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
当我尝试手动运行它时,它会挂起。我想这是一个问题,要么是本地 Spark,要么是所需的 Hadoop(和 winutils.exe)安装,但databricks-connect
需要全新的 pyspark 安装(文档说在安装之前卸载 pyspark)。
很高兴有任何参考资料:
- 修复数据块连接问题
- 修复底层 pyspark 安装问题