从 Google AI Platform Training and Prediction 中的训练到预测阶段,都有运行时版本的概念,我对此有点困惑。启动训练时必须指定运行时版本(请参阅此处)。当您以 Tensorflow SavedModel 格式导出模型时必须指定另一个,当您创建版本化模型时必须指定另一个。
例子。我使用适用于 Python 的 Google API 客户端库提交了一个训练作业,因此我按照本指南配置了我的训练作业。当我使用tensorflow object-detection API时,我的配置文件如下所示:
{
"jobId": "...",
"trainingInput": {
"runtimeVersion": "2.1",
"pythonVersion": "3.7",
"scaleTier": "CUSTOM",
"masterType": "standard",
"workerCount": "1",
"workerType": "cloud_tpu",
"workerConfig": {
"tpuTfVersion": "1.15"
},
"region": "us-central1",
"jobDir": "...",
"pythonModule": "object_detection.model_main_tf2",
"args": [
"--model_dir",
"...",
"--pipeline_config_path",
"..."
]
}
}
接下来是我的导出配置文件:
{
"jobId": "...",
"trainingInput": {
"runtimeVersion": "2.1",
"pythonVersion": "3.7",
"scaleTier": "CUSTOM",
"masterType": "standard",
"workerCount": "1",
"workerType": "standard",
"region": "us-central1",
"pythonModule": "object_detection.exporter_main_v2",
"args": [
"--input_type",
"image_tensor",
"--pipeline_config_path",
"...",
"--trained_checkpoint_dir",
"...",
"--output_directory",
"..."
]
}
}
请参阅我必须再次指定运行时版本。它是否必须与用于训练作业的运行时版本相同?对于最后一部分,在创建模型之后,我必须创建一个版本。我的配置文件:
{
"name": "v1",
"description": "version description",
"isDefault": "False",
"deploymentUri": "...",
"createTime": "string",
"runtimeVersion": "2.1",
"machineType": "mls1-c1-m2",
"framework": "TENSORFLOW",
"pythonVersion": "3.7"
}
再次是运行时版本。同样的问题:它必须与以前使用的相同吗?或者训练和预测的运行时间可以不同吗?