您可以添加 pyspark 脚本的终端输出吗?了解从哪里开始会很有帮助,它可能会为我们提供线索,说明您的设置中存在什么问题。
至少要看看你是否安装pyspark正确(你仍然可能需要做额外的操作才能完全确定),但你可以像下面保存在 python 文件中的脚本那样做sample_test.py
from pyspark import sql
spark = sql.SparkSession.builder \
.appName("local-spark-session") \
.getOrCreate()
运行它应该打印出类似下面的内容
C:\Users\user\Desktop>python sample_test.py
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
C:\Users\user\Desktop>SUCCESS: The process with PID 16368 (child process of PID 12664) has been terminated.
SUCCESS: The process with PID 12664 (child process of PID 11736) has been terminated.
SUCCESS: The process with PID 11736 (child process of PID 6800) has been terminated.
下面是 pyspark 的示例测试,使用 pytest 保存在名为sample_test.py
from pyspark import sql
spark = sql.SparkSession.builder \
.appName("local-spark-session") \
.getOrCreate()
def test_create_session():
assert isinstance(spark, sql.SparkSession) == True
assert spark.sparkContext.appName == 'local-spark-session'
assert spark.version == '3.1.2'
你可以简单地运行如下
C:\Users\user\Desktop>pytest -v sample_test.py
============================================= test session starts =============================================
platform win32 -- Python 3.6.7, pytest-6.2.5, py-1.10.0, pluggy-1.0.0 -- c:\users\user\appdata\local\programs\python\python36\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\user\Desktop
collected 1 item
sample_test.py::test_create_session PASSED [100%]
============================================== 1 passed in 4.51s ==============================================
C:\Users\user\Desktop>SUCCESS: The process with PID 4752 (child process of PID 9780) has been terminated.
SUCCESS: The process with PID 9780 (child process of PID 8988) has been terminated.
SUCCESS: The process with PID 8988 (child process of PID 20176) has been terminated.
上面的示例适用于 Windows。我的帐户是新帐户,因此无法回复您的评论...您能否更新您的问题以分享来自终端的消息/错误(如果有)?顺便说一句,只是想知道您使用的是什么操作系统?