我正在尝试在本地环境中运行 pyspark 作业。
设置 pipenv 并成功安装模块(numpy)后,模块仍然对代码不可见。
使用 pip 而不是 pipenv 安装库。我在这里想念什么?
终端输出如下所示。
PS C:\Users\user\Desktop\spark\test> pipenv shell
Shell for C:\Users\user\.virtualenvs\test-sCQB0P3C already activated.
No action taken to avoid nested environments.
PS C:\Users\user\Desktop\spark\test> pipenv graph
numpy==1.20.3
pipenv==2020.11.15
- certifi [required: Any, installed: 2020.12.5]
- pip [required: >=18.0, installed: 21.1.1]
- setuptools [required: >=36.2.1, installed: 56.0.0]
- virtualenv [required: Any, installed: 20.4.6]
- appdirs [required: >=1.4.3,<2, installed: 1.4.4]
- distlib [required: >=0.3.1,<1, installed: 0.3.1]
- filelock [required: >=3.0.0,<4, installed: 3.0.12]
- six [required: >=1.9.0,<2, installed: 1.16.0]
- virtualenv-clone [required: >=0.2.5, installed: 0.5.4]
pyspark==2.4.0
- py4j [required: ==0.10.7, installed: 0.10.7]
PS C:\Users\user\Desktop\spark\test> spark-submit --master local[*] --files
configs\etl_config.json jobs\etl_job.py
Traceback (most recent call last):
File "C:/Users/user/Desktop/spark/test/jobs/etl_job.py", line 40, in <module>
from dependencies.class import XLoader
File "C:\Users\user\Desktop\spark\test\dependencies\X.py", line 2, in <module>
import numpy as np
ModuleNotFoundError: No module named 'numpy'