apache-spark - 工作节点的控制台输出到 Spark 集群模式下的文件

Question

我正在pyspark使用spark-submit. 作业成功运行。

现在我正在尝试将此作业的控制台输出收集到如下文件中。

spark-submit in yarn-client mode

spark-submit --master yarn-client --num-executors 5 --executor-cores 5 --driver-memory 5G --executor-memory 10G --files /usr/hdp/current/spark-client/conf/hive-site.xml --jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar --py-files customer_profile/customer_helper.py#customer_helper.py,customer_profile/customer_json.json customer_profile/customer.py  > /home/$USER/logs/customer_2018_10_26 2>&1

我能够重定向写入文件的所有控制台输出，/home/$USER/logs/customer_2018_10_26包括所有loglevels and any stacktrace errors

spark-submit in yarn-cluster mode

spark-submit --master yarn-cluster --num-executors 5 --executor-cores 5 --driver-memory 5G --executor-memory 10G --files /usr/hdp/current/spark-client/conf/hive-site.xml --jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar --py-files customer_profile/customer_helper.py#customer_helper.py,customer_profile/customer_json.json customer_profile/customer.py  > /home/$USER/logs/customer_2018_10_26 2>&1

使用yarn-cluster模式我无法重定向写入文件的控制台输出/home/$USER/logs/customer_2018_10_26。

问题是如果我的工作在yarn-client模式下失败，我可以去归档/home/$USER/logs/customer_2018_10_26并轻松查找错误。

但是，如果我的工作在yarn-cluster模式下失败，那么我不会将堆栈跟踪复制到文件/home/$USER/logs/customer_2018_10_26中。我可以调试错误的唯一方法是使用yarn logs.

我想避免使用yarn logs选项相反，我想在使用模式时error stack trace在文件/home/$USER/logs/customer_2018_10_26本身中查看。yarn-cluster

我怎样才能做到这一点？

apache-spark - 工作节点的控制台输出到 Spark 集群模式下的文件

0 回答 0

Related

Reference