我正在使用 Dataproc 集群设置来测试所有功能。我已经创建了一个集群模板,并且几乎每天都在使用创建命令,但是这周它停止了工作。打印的错误是:
ERROR: (gcloud.dataproc.clusters.create) Operation [projects/cluster_name/regions/us-central1/operations/number] failed: Failed to initialize node cluster_name-m: Component ranger failed to activate post-hdfs See output in: gs://cluster_bucket/google-cloud-dataproc-metainfo/number/cluster_name-m/dataproc-post-hdfs-startup-script_output.
当我看到消息打印的输出时,发现的错误是:
<13>Mar 3 12:22:59 google-dataproc-startup[15744]: <13>Mar 3 12:22:59 post-hdfs-activate-component-ranger[15786]: ERROR: Error CREATEing SolrCore 'ranger_audits': Unable to create core [ranger_audits] Caused by: Java heap space
<13>Mar 3 12:22:59 google-dataproc-startup[15744]: <13>Mar 3 12:22:59 post-hdfs-activate-component-ranger[15786]:
<13>Mar 3 12:22:59 google-dataproc-startup[15744]: <13>Mar 3 12:22:59 post-hdfs-activate-component-ranger[15786]: + exit_code=1
<13>Mar 3 12:22:59 google-dataproc-startup[15744]: <13>Mar 3 12:22:59 post-hdfs-activate-component-ranger[15786]: + [[ 1 -ne 0 ]]
<13>Mar 3 12:22:59 google-dataproc-startup[15744]: <13>Mar 3 12:22:59 post-hdfs-activate-component-ranger[15786]: + echo 1
<13>Mar 3 12:22:59 google-dataproc-startup[15744]: <13>Mar 3 12:22:59 post-hdfs-activate-component-ranger[15786]: + log_and_fail ranger 'Component ranger failed to activate post-hdfs'
我运行的创建命令是:
gcloud dataproc clusters create cluster_name \
--bucket cluster_bucket \
--region us-central1 \
--subnet subnet_dataproc \
--zone us-central1-c \
--master-machine-type n1-standard-8 \
--master-boot-disk-size 500 \
--num-workers 2 \
--worker-machine-type n1-standard-8 \
--worker-boot-disk-size 1000 \
--image-version 2.0.29-debian10 \
--optional-components ZEPPELIN,RANGER,SOLR \
--autoscaling-policy=autoscale_rule \
--properties="dataproc:ranger.kms.key.uri=projects/gcp-project/locations/global/keyRings/kbs-dataproc-keyring/cryptoKeys/kbs-dataproc-key,dataproc:ranger.admin.password.uri=gs://cluster_bucket/kerberos-root-principal-password.encrypted,hive:hive.metastore.warehouse.dir=gs://cluster_bucket/user/hive/warehouse,dataproc:solr.gcs.path=gs://cluster_bucket/solr2,dataproc:ranger.cloud-sql.instance.connection.name=gcp-project:us-central1:ranger-metadata,dataproc:ranger.cloud-sql.root.password.uri=gs://cluster_bucket/ranger-root-mysql-password.encrypted" \
--kerberos-root-principal-password-uri=gs://cluster_bucket/kerberos-root-principal-password.encrypted \
--kerberos-kms-key=projects/gcp-project/locations/global/keyRings/kbs-dataproc-keyring/cryptoKeys/kbs-dataproc-key \
--project gcp-project \
--enable-component-gateway \
--initialization-actions gs://goog-dataproc-initialization-actions-us-central1/cloud-sql-proxy/cloud-sql-proxy.sh,gs://cluster_bucket/hue.sh,gs://goog-dataproc-initialization-actions-us-central1/livy/livy.sh,gs://goog-dataproc-initialization-actions-us-central1/sqoop/sqoop.sh \
--metadata livy-timeout-session='4h' \
--metadata "hive-metastore-instance=gcp-project:us-central1:hive-metastore" \
--metadata "kms-key-uri=projects/gcp-project/locations/global/keyRings/kbs-dataproc-keyring/cryptoKeys/kbs-dataproc-key" \
--metadata "db-admin-password-uri=gs://cluster_bucket/hive-root-mysql-password.encrypted" \
--metadata "db-hive-password-uri=gs://cluster_bucket/hive-mysql-password.encrypted" \
--scopes=default,sql-admin
我知道这与 Ranger/Solr 设置有关,但我不知道如何在不创建替代初始化脚本或创建自定义机器映像的情况下增加此堆大小。如果您对如何解决此问题有任何想法或需要有关我的设置的更多信息,请告诉我。