我在 spark shell 工作中运行工作
--num-executors 15
--driver-memory 15G
--executor-memory 7G
--executor-cores 8
--conf spark.yarn.executor.memoryOverhead=2G
--conf spark.sql.shuffle.partitions=500
--conf spark.sql.autoBroadcastJoinThreshold=-1
--conf spark.executor.memoryOverhead=800
工作卡住了,没有启动代码是在 270m 的大型数据集上使用过滤条件进行交叉连接。我已将大表 270m 和小表 (100000) 的分区增加到 16000,我已将其转换为广播变量
我已经为这项工作添加了 spark ui,
所以我必须减少分区,增加执行者,任何想法
感谢您的帮助。
![spark ui 1][1] ![spark ui 2][2] ![spark ui 3][3] 10 小时后
状态:任务:7341/16936(16624 失败)
检查容器错误日志
RM Home
NodeManager
Tools
Failed while trying to construct the redirect url to the log server. Log Server url may not be configured
java.lang.Exception: Unknown container. Container either has not started or has already completed or doesn't belong to this node at all.
[50per 完成 ui 1 ][4][50per 完成 ui 2][5] [1]:https://i.stack.imgur.com/nqcys.png [2]:https://i.stack.imgur .com/S2vwL.png [3]: https://i.stack.imgur.com/81FUn.png [4]: https://i.stack.imgur.com/h5MTa.png [5]: https: //i.stack.imgur.com/yDfKF.png