在 Ubuntu 上的 Jupyter Notebook 中运行 PySpark 时,我有时会遇到 Java 失败的问题。我想要的是从 Java 端查看错误,因为我所看到的通常是 Python 的很长的一般错误,可以总结为:
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/py4j/java_gateway.py", line 1207, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
这个错误可能意味着很多事情,它根本没有帮助。通常这意味着 Java 崩溃了,但我想知道究竟是为什么。
我需要这些日志的示例是,例如,我尝试在 DGX-1 机器上的 PySpark 上运行 Rapids,但在初始化 Spark 上下文时,它会像 abo 一样以 Java 崩溃告终。这不是这些错误的唯一原因,但这段代码很容易在我这边导致这些错误。
import pyspark
import os
cudf = "cudf-0.17-cuda10-1.jar"
rapids = "rapids-4-spark_2.12-0.2.0.jar"
script = "getGpuResources.sh"
separator = ","
conf = pyspark.SparkConf()
conf.set("spark.jars",cudf + "," + rapids)
conf.set("spark.plugins","com.nvidia.spark.SQLPlugin")
conf.set("spark.driver.memory","48g")
conf.set("spark.executor.memory","48g")
conf.set("spark.driver.cores","80")
conf.set("spark.executor.cores","80")
conf.set("spark.task.cpus","80")
conf.set("spark.dynamicAllocation.enabled","false")
conf.set("spark.rapids.sql.concurrentGpuTasks","8")
conf.set("spark.dynamicAllocation.enabled","false")
conf.set("spark.sql.extensions","ai.rapids.spark.Plugin")
conf.set("spark.driver.resource.gpu.amount","8")
conf.set("spark.driver.resource.gpu.discoveryScript",script)
conf.set("spark.executor.resource.gpu.amount","8")
conf.set("spark.executor.resource.gpu.discoveryScript",script)
conf.set("spark.task.resource.gpu.amount","8")
sc = pyspark.SparkContext(appName="rapids", conf = conf)
我的问题:有没有办法以某种方式捕获 PySpark 运行的 Java 进程的标准输出(使用 pyspark/jupyter/Ubuntu)以了解 Java 崩溃的真正原因?