hadoop - Spark程序在集群上运行很慢

Question

我正在尝试在具有 2 个节点和 1 个主节点（均具有 16 Gb RAM）的集群中运行我的PySpark 。我已经用下面的命令运行了我的 spark。

spark-submit --master yarn --deploy-mode cluster --name "Pyspark" --num-executors 40 --executor-memory 2g CD.py

但是我的代码运行速度很慢，解析 8.2 GB 的数据需要将近 1 个小时。然后我尝试更改我的YARN中的配置。我更改了以下属性。

yarn.scheduler.increment-allocation-mb = 2 GiB

yarn.scheduler.minimum-allocation-mb = 2 GiB

yarn.scheduler.increment-allocation-mb = 2 GiB

yarn.scheduler.maximum-allocation-mb = 2 GiB

完成这些更改后，我的 spark 仍然运行非常缓慢，需要 1 个多小时才能解析 8.2 GB 的文件。

score 1 · Accepted Answer

你能试试下面的配置吗

spark.executor.memory 5g

spark.executor.cores 5

spark.executor.instances 3

spark.driver.cores 2

hadoop - Spark程序在集群上运行很慢

1 回答 1

Related

Reference