tensorflow - Tensorflow/AI 云平台：HyperTune 试验未能报告超参数调优指标

Question

我tf.estimator在 Google AI Platform 上使用带有 TensorFlow 2.1 的 API 来构建 DNN 回归器。为了使用 AI Platform Training 超参数调优，我遵循了Google 的文档。我使用了以下配置参数：

配置.yaml：

trainingInput:
    scaleTier: BASIC
    hyperparameters:
        goal: MINIMIZE
        maxTrials: 2
        maxParallelTrials: 2
        hyperparameterMetricTag: rmse
        enableTrialEarlyStopping: True
        params:
        - parameterName: batch_size
          type: DISCRETE
          discreteValues:
          - 100
          - 200
          - 300
        - parameterName: lr
          type: DOUBLE
          minValue: 0.0001
          maxValue: 0.1
          scaleType: UNIT_LOG_SCALE

为了将指标添加到我的摘要中，我为我的 DNNRegressor 使用了以下代码：

def rmse(labels, predictions):
    pred_values = predictions['predictions']
    rmse = tf.keras.metrics.RootMeanSquaredError(name='root_mean_squared_error')
    rmse.update_state(labels, pred_values)
    return {'rmse': rmse}

def train_and_evaluate(hparams):
    ...
    estimator = tf.estimator.DNNRegressor(
                       model_dir = output_dir,
                       feature_columns = get_cols(),
                       hidden_units = [max(2, int(FIRST_LAYER_SIZE * SCALE_FACTOR ** i))
                        for i in range(NUM_LAYERS)],
                       optimizer = tf.keras.optimizers.Adam(learning_rate=LEARNING_RATE),
                       config = run_config)
    estimator = tf.estimator.add_metrics(estimator, rmse)

根据 Google 的文档，该add_metric函数使用指定的度量创建一个新的估计器，然后将其用作超参数度量。但是，AI Platform Training 服务无法识别此指标： AI Platform 上的作业详细信息

在本地运行代码时，rmse 指标确实会在日志中输出。那么，如何使用 Estimators 使指标可用于 AI Platform 上的训练作业？

此外，还有一个通过cloudml-hypertunePython 包报告指标的选项。但它需要度量值作为输入参数之一。如何从tf.estimator.train_and_evaluate函数中提取度量（因为这是我用来训练/评估我的估计器的函数）以输入到report_hyperparameter_tuning_metric函数中？

hpt = hypertune.HyperTune()
hpt.report_hyperparameter_tuning_metric(
    hyperparameter_metric_tag='rmse',
    metric_value=??,
    global_step=1000
)

ETA：日志显示没有错误。它表示作业成功完成，即使它失败了。

tensorflow - Tensorflow/AI 云平台：HyperTune 试验未能报告超参数调优指标

0 回答 0

Related

Reference