I'm trying to run Spark jobs on a Dataproc cluster, but Spark will not start due to Yarn being misconfigured.
I receive the following error when running "spark-shell" from the shell (locally on the master), as well as when uploading a job through the web-GUI and the gcloud command line utility from my local machine:
15/11/08 21:27:16 ERROR org.apache.spark.SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: Required executor memory (38281+2679 MB) is above the max threshold (20480 MB) of this cluster! Please increase the value of 'yarn.s
cheduler.maximum-allocation-mb'.
I tried modifying the value in /etc/hadoop/conf/yarn-site.xml
but it didn't change anything. I don't think it pulls the configuration from that file.
I've tried with multiple cluster combinations, at multiple sites (mainly Europe), and I only got this to work with the low memory version (4-cores, 15 gb memory).
I.e. this is only a problem on the nodes configured for memory higher than the yarn default allows.