scala - 火花任务未开始执行

Question

我在 spark shell 工作中运行工作

--num-executors 15 
--driver-memory 15G 
--executor-memory 7G 
--executor-cores 8 
--conf spark.yarn.executor.memoryOverhead=2G 
--conf spark.sql.shuffle.partitions=500 
--conf spark.sql.autoBroadcastJoinThreshold=-1 
--conf spark.executor.memoryOverhead=800

工作卡住了，没有启动代码是在 270m 的大型数据集上使用过滤条件进行交叉连接。我已将大表 270m 和小表 (100000) 的分区增加到 16000，我已将其转换为广播变量

我已经为这项工作添加了 spark ui，

所以我必须减少分区，增加执行者，任何想法

感谢您的帮助。

![spark ui 1][1] ![spark ui 2][2] ![spark ui 3][3] 10 小时后

状态：任务：7341/16936（16624 失败）

检查容器错误日志

RM Home
NodeManager
Tools
Failed while trying to construct the redirect url to the log server. Log Server url may not be configured
java.lang.Exception: Unknown container. Container either has not started or has already completed or doesn't belong to this node at all.

[50per 完成 ui 1 ][4][50per 完成 ui 2][5] [1]：https://i.stack.imgur.com/nqcys.png [2]：https://i.stack.imgur .com/S2vwL.png [3]: https://i.stack.imgur.com/81FUn.png [4]: https://i.stack.imgur.com/h5MTa.png [5]: https: //i.stack.imgur.com/yDfKF.png

score 0 · Accepted Answer

如果您可以提及您的集群配置，那将会很有帮助。

但是由于您添加了 1000 个小表的广播正在工作，但 100,000 可能不是您需要调整内存配置。

根据您的配置，我假设您有总 :15 * 7 = 105GB内存。

你可以试试--num-executors 7 --executor-memory 15

这将为每个执行程序提供更多内存来保存广播变量。请--executor-cores相应调整以正确使用

scala - 火花任务未开始执行

1 回答 1

Related

Reference