0

我在 VertexAI 管道中使用谷歌云管道组件 CustomPythonPackageTrainingJobRunOp。我之前已经能够将这个包作为 CustomTrainingJob 成功运行。我可以在日志中看到多条 (11) 条错误消息,但唯一对我来说似乎有意义的是“ValueError: too many values to unpack (expected 2)”,但我无法找出解决方案。如果需要,我也可以添加所有其他错误消息。我在训练代码开始时记录了一些消息,所以我知道错误发生在训练代码执行之前。我完全坚持这一点。链接到有人在管道中使用 CustomPythonPackageTrainingJobRunOp 的示例也非常有用。下面是我试图执行的管道代码:

import kfp
from kfp.v2 import compiler
from kfp.v2.google.client import AIPlatformClient
from google_cloud_pipeline_components import aiplatform as gcc_aip

@kfp.dsl.pipeline(name=pipeline_name)
def pipeline(
    project: str = "adsfafs-321118",
    location: str = "us-central1",
    display_name: str = "vertex_pipeline",
    python_package_gcs_uri: str = "gs://vertex/training/training-package-3.0.tar.gz",
    python_module_name: str = "trainer.task",
    container_uri: str = "us-docker.pkg.dev/vertex-ai/training/scikit-learn-cpu.0-23:latest",
    staging_bucket: str = "vertex_bucket",
    base_output_dir: str = "gs://vertex_artifacts/custom_training/"
):
    
    gcc_aip.CustomPythonPackageTrainingJobRunOp(
        display_name=display_name,
        python_package_gcs_uri=python_package_gcs_uri,
        python_module=python_module_name,
        container_uri=container_uri,
        project=project,
        location=location,
        staging_bucket=staging_bucket,
        base_output_dir=base_output_dir,
        args = ["--arg1=val1", "--arg2=val2", ...]
    )



compiler.Compiler().compile(
    pipeline_func=pipeline, package_path=package_path
)

api_client = AIPlatformClient(project_id=project_id, region=region)

response = api_client.create_run_from_job_spec(
    package_path,
    pipeline_root=pipeline_root_path
)

在 CustomPythonPackageTrainingJobRunOp 的文档中,参数“python_module”的类型似乎是“google.cloud.aiplatform.training_jobs.CustomPythonPackageTrainingJob”而不是字符串,这看起来很奇怪。但是,我尝试重新定义管道,其中我已将 CustomPythonPackageTrainingJobRunOp 中的参数 python_module 替换为 CustomPythonPackageTrainingJob 对象而不是字符串,如下所示,但仍然出现相同的错误:

def pipeline(
    project: str = "...",
    location: str = "...",
    display_name: str = "...",
    python_package_gcs_uri: str = "...",
    python_module_name: str = "...",
    container_uri: str = "...",
    staging_bucket: str = "...",
    base_output_dir: str = "...",
):

    job = aiplatform.CustomPythonPackageTrainingJob(
        display_name= display_name,
        python_package_gcs_uri=python_package_gcs_uri,
        python_module_name=python_module_name,
        container_uri=container_uri,
        staging_bucket=staging_bucket
    )
    
    gcc_aip.CustomPythonPackageTrainingJobRunOp(
        display_name=display_name,
        python_package_gcs_uri=python_package_gcs_uri,
        python_module=job,
        container_uri=container_uri,
        project=project,
        location=location,
        base_output_dir=base_output_dir,
        args = ["--arg1=val1", "--arg2=val2", ...]
    )

编辑:

添加了我正在传递但忘记在此处添加的参数。

4

1 回答 1

0

事实证明,我将 args 传递给 python 模块的方式是不正确的。而不是args = ["--arg1=val1", "--arg2=val2", ...],您需要指定args = ["--arg1", val1, "--arg2", val2, ...]

于 2021-08-16T22:11:30.017 回答