google-cloud-ml - 顶点管道度量值未添加到度量工件？

Question

我们正在尝试从我们的顶点管道返回一些指标，以便它们在顶点 UI 的运行比较和元数据工具中可见。

我在这里看到我们可以使用这种输出类型Output[Metrics]，以及添加指标的后续metrics.log_metric("metric_name", metric_val)方法，从可用的文档看来，这已经足够了。

我们希望使用可重用的组件方法，而不是示例所基于的基于 Python 函数的组件。所以我们在我们的组件代码中实现了它，如下所示：

我们在 component.yaml 中添加了输出：

outputs:
    - name: metrics
      type: Metrics
      description: evaluation metrics path

然后将输出添加到实现中的命令：

        command: [
            python3, main.py,
            --gcs-test-data-path,       {inputValue: gcs_test_data_path},
            --gcs-model-path,  {inputValue: gcs_model_path},
            --gcs-output-bucket-id,  {inputValue: gcs_output_bucket_id},
            --project-id, {inputValue: project_id},
            --timestamp, {inputValue: timestamp},
            --batch-size, {inputValue: batch_size},
            --img-height, {inputValue: img_height},
            --img-width,  {inputValue: img_width},
            --img-depth,  {inputValue: img_depth},
            --metrics,  {outputPath: metrics},
        ]

接下来在组件主 python 脚本中，我们使用 argparse 解析这个参数：

PARSER.add_argument('--metrics',
                    type=Metrics,
                    required=False,
                    help='evaluation metrics output')

并将其传递给组件主函数：

if __name__ == '__main__':
    ARGS = PARSER.parse_args()
    evaluation(gcs_test_data_path=ARGS.gcs_test_data_path,
               gcs_model_path=ARGS.gcs_model_path,
               gcs_output_bucket_id=ARGS.gcs_output_bucket_id,
               project_id=ARGS.project_id,
               timestamp=ARGS.timestamp,
               batch_size=ARGS.batch_size,
               img_height=ARGS.img_height,
               img_width=ARGS.img_width,
               img_depth=ARGS.img_depth,
               metrics=ARGS.metrics,
               )

在组件函数的声明中，我们将这个度量参数键入为Output[Metrics]

from kfp.v2.dsl import Output, Metrics

def evaluation(gcs_test_data_path: str,
               gcs_model_path: str,
               gcs_output_bucket_id: str,
               metrics: Output[Metrics],
               project_id: str,
               timestamp: str,
               batch_size: int,
               img_height: int,
               img_width: int,
               img_depth: int):

最后，我们在这个评估函数中实现 log_metric 方法：

    metrics.log_metric('accuracy', acc)
    metrics.log_metric('precision', prec)
    metrics.log_metric('recall', recall)
    metrics.log_metric('f1-score', f_1)

当我们运行这个管道时，我们可以在 DAG 中看到这个度量工件：

并且 Metrics Artifacts 列在 Vertex 的 Metadata UI 中：

但是，点击查看工件 JSON，没有列出元数据：

此外，在管道 UI 中比较运行时，无元数据可见：

最后，导航到 GCS 中的 Objects URI，我们遇到“未找到请求的实体。”，我认为这表明没有任何内容写入 GCS：

我们在可重用组件中的这种度量实现是否做错了什么？据我所知，这一切对我来说似乎都是正确的，但鉴于目前的文档似乎主要关注基于 Python 函数的组件的示例，这很难说清楚。

我们是否可能需要主动将此 Metrics 对象写入 OutputPath？

任何帮助表示赞赏。

- - - 更新 - -

从那以后，我已经能够获取工件元数据和 URI 来更新。最后，我们使用 kfp sdk 生成了一个基于 @component 修饰的 Python 函数的 yaml 文件，然后我们为我们的可重用组件调整了这种格式。我们的 component.yaml 现在看起来像这样：

name: predict
description: Prepare and create predictions request
implementation:
    container:
      args:
      - --executor_input
      - executorInput: null
      - --function_to_execute
      - predict
      command:
      - python3
      - -m
      - kfp.v2.components.executor_main
      - --component_module_path
      - predict.py
      image: gcr.io/PROJECT_ID/kfp/components/predict:latest
inputs: 
    - name: input_1
      type: String
    - name: intput_2
      type: String
outputs:
    - name: output_1
      type: Dataset
    - name: output_2
      type: Dataset

通过对 yaml 的更改，我们现在可以成功更新工件元数据字典，并通过artifact.path = '/path/to/file'. 这些更新显示在 Vertex UI 中。

我仍然不确定为什么Kubeflow 文档中指定的 component.yaml 格式不起作用 - 我认为这可能是 Vertex Pipelines 的错误。

score 1 · Accepted Answer

正如我在您正在运行的代码中看到的那样，一切都应该没有问题；但是，正如您评论的那样，我建议您将指标对象写入路径，以便它可以到达您项目中的某个位置。

google-cloud-ml - 顶点管道度量值未添加到度量工件？

1 回答 1

Related

Reference