您需要配置 pyspark 内核。
在我的服务器上,jupyter 内核位于:
/usr/local/share/jupyter/kernels/
您可以通过创建一个新目录来创建一个新内核:
mkdir /usr/local/share/jupyter/kernels/pyspark
然后创建kernel.json文件 - 我粘贴我的作为参考:
{
"display_name": "pySpark (Spark 1.6.0)",
"language": "python",
"argv": [
"/usr/local/bin/python2.7",
"-m",
"ipykernel",
"-f",
"{connection_file}"
],
"env": {
"PYSPARK_PYTHON": "/usr/local/bin/python2.7",
"SPARK_HOME": "/usr/lib/spark",
"PYTHONPATH": "/usr/lib/spark/python/lib/py4j-0.9-src.zip:/usr/lib/spark/python/",
"PYTHONSTARTUP": "/usr/lib/spark/python/pyspark/shell.py",
"PYSPARK_SUBMIT_ARGS": "--master yarn-client pyspark-shell"
}
}
调整路径和 python 版本,你的 pyspark 内核就可以使用了。