3

I'm getting the following error while using MultiWorkerMirroredStrategy() for training Custom Estimator on Google AI-Platform (CMLE).

ValueError: Unrecognized task_type: 'master', valid task types are: "chief", "worker", "evaluator" and "ps".

Both MirroredStrategy() and PamameterServerStrategy() are working fine on AI-Platform with their respective config.yaml files. I'm currently not providing device scopes for any operations. Neither I'm providing any device filter in session config, tf.ConfigProto(device_filters=device_filters).

The config.yaml file which I'm using for training with MultiWorkerMirroredStrategy() is:

trainingInput:
  scaleTier: CUSTOM
  masterType: standard_gpu
  workerType: standard_gpu
  workerCount: 4

The masterType input is mandatory for submitting the training job on AI-Platform.

Note: It's showing 'chief' as a valid task type and 'master' as invalid. I'm providing tensorflow-gpu==1.14.0 in setup.py for trainer package.

4

2 回答 2

2

我遇到了同样的问题。据我了解,MultiWorkerMirroredStrategy 配置值与其他策略以及 CMLE 默认提供的不同:https ://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras#multi-worker_configuration

它不支持“主”节点,而是将其称为“首席”。如果您在容器中运行作业,可以尝试使用“useChiefInTfConfig”标志,请参阅此处的文档:https://developers.google.com/resources/api-libraries/documentation/ml/v1/python/latest/ml_v1。项目.jobs.html

否则,您可能会尝试手动破解您的 TF_CONFIG:

  TF_CONFIG = os.environ.get('TF_CONFIG')
  if TF_CONFIG and '"master"' in TF_CONFIG:
    os.environ['TF_CONFIG'] = TF_CONFIG.replace('"master"', '"chief"')
于 2019-11-23T01:01:21.567 回答
1

(1) 这似乎是 MultiWorkerMirroredStrategy 的一个错误。请在 TensorFlow 中提交错误。在 TensorFlow 1.x 中,它应该使用 master,在 TensorFlow 2.x 中,它应该使用 Chief。该代码(错误地)要求首席,而AI Platform(因为您使用的是1.14)仅提供大师。顺便说一句:master = Chief + evaluator。

(2) 不要在 setup.py 中添加 tensorflow。--runtime-version使用(请参阅https://cloud.google.com/ml-engine/docs/runtime-version-list)标志向 gcloud提供您希望 AI Platform 使用的 tensorflow 框架。

于 2019-10-06T18:47:19.363 回答