0

我正在尝试在 Windows 本地机器上运行我的 pyspark 代码测试。Pytest 卡在我在测试代码中创建 SparkSession 的位置。我是否必须在我的本地机器上安装/配置 Spark 才能使 Pytest 工作。最后,测试将作为 CI/CD 的一部分执行,我是否还必须在构建机器上配置 Spark?我有一个相关的问题,但看起来问题不是 Visual Studio Code 而是 pytest(因为我从命令行运行 pytest 时遇到了同样的问题)

下面是我的测试代码

# test code

from pyspark.sql import SparkSession, Row, DataFrame

import pytest

def test_poc():
   spark_session = SparkSession.builder.master('local[2]').getOrCreate()  #this line never returns when debugging test.
   spark_session.createDataFrame(data,schema) #data and schema not shown here.
4

1 回答 1

0

您可以添加 pyspark 脚本的终端输出吗?了解从哪里开始会很有帮助,它可能会为我们提供线索,说明您的设置中存在什么问题。

至少要看看你是否安装pyspark正确(你仍然可能需要做额外的操作才能完全确定),但你可以像下面保存在 python 文件中的脚本那样做sample_test.py

from pyspark import sql


spark = sql.SparkSession.builder \
        .appName("local-spark-session") \
        .getOrCreate()
        

运行它应该打印出类似下面的内容

C:\Users\user\Desktop>python sample_test.py
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

C:\Users\user\Desktop>SUCCESS: The process with PID 16368 (child process of PID 12664) has been terminated.
SUCCESS: The process with PID 12664 (child process of PID 11736) has been terminated.
SUCCESS: The process with PID 11736 (child process of PID 6800) has been terminated.

下面是 pyspark 的示例测试,使用 pytest 保存在名为sample_test.py

from pyspark import sql


spark = sql.SparkSession.builder \
        .appName("local-spark-session") \
        .getOrCreate()
        

def test_create_session():
    assert isinstance(spark, sql.SparkSession) == True
    assert spark.sparkContext.appName == 'local-spark-session'
    assert spark.version == '3.1.2'

你可以简单地运行如下

C:\Users\user\Desktop>pytest -v sample_test.py
============================================= test session starts =============================================
platform win32 -- Python 3.6.7, pytest-6.2.5, py-1.10.0, pluggy-1.0.0 -- c:\users\user\appdata\local\programs\python\python36\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\user\Desktop
collected 1 item

sample_test.py::test_create_session PASSED                                                               [100%]

============================================== 1 passed in 4.51s ==============================================

C:\Users\user\Desktop>SUCCESS: The process with PID 4752 (child process of PID 9780) has been terminated.
SUCCESS: The process with PID 9780 (child process of PID 8988) has been terminated.
SUCCESS: The process with PID 8988 (child process of PID 20176) has been terminated.

上面的示例适用于 Windows。我的帐户是新帐户,因此无法回复您的评论...您能否更新您的问题以分享来自终端的消息/错误(如果有)?顺便说一句,只是想知道您使用的是什么操作系统?

于 2021-09-17T07:38:01.130 回答