python - Vertex AI 自定义容器训练作业 python SDK - InvalidArgument 400 错误

问问题 2022-01-05T16:53:55.217

49 次

我正在尝试使用 python SDK 运行 Vertex AI 自定义训练作业，遵循本自述文件中列出的一般说明。我的代码如下（删除敏感数据）：

job = aiplatform.CustomContainerTrainingJob(
    display_name='python_api_test',
    container_uri='{URI FOR CUSTOM CONTAINER IN GOOGLE ARTIFACT REGISTRY}',
    staging_bucket='{GCS BUCKET PATH IN 'gs://' FORMAT}',
    model_serving_container_image_uri='us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-4:latest',
)

job.run(
    model_display_name='python_api_model',
    args='{ARG PASSED TO CONTAINER ENTRYPOINT}',
    replica_count=1,
    machine_type='n1-standard-4',
    accelerator_type='NVIDIA_TESLA_T4',
    accelerator_count=2,
    environment_variables={
        {A COUPLE OF SECRETS PASSED TO CONTAINER IN DICTIONARY FORMAT}
    }
)

当我执行job.run()时，我收到以下错误：

InvalidArgument: 400 Unable to parse `training_pipeline.training_task_inputs` into custom task `inputs` defined in the file: gs://google-cloud-aiplatform/schema/trainingjob/definition/custom_task_1.0.0.yaml

完整的回溯不会显示它对任何特定输入的不满。我已经使用 Vertex CLI 在同一个容器中成功运行aiplatform.init()了作业。我相信我的没有问题（我正在同一个项目中的 Vertex 工作台机器上运行作业）。

python - Vertex AI 自定义容器训练作业 python SDK - InvalidArgument 400 错误

0 回答 0

Related

Reference