google-cloud-platform - 如何自动启动 AI 平台作业？

Question

我创建了一个训练作业，我从大查询中获取我的数据，执行训练和部署模型。我想在这两种情况下自动开始训练：

超过 1000 个新行添加到数据集中
有时间表（例如，每周一次）

我检查了 GCP Cloud Scheduler，但它似乎不适合我的情况。

score 4 · Accepted Answer

Cloud Scheduler 是按计划触发培训的正确工具。我不知道你的拦截器是什么！！

对于你的第一点，你不能。您不能放置触发器（在 BigQuery 或其他数据库上）以在 X 新行之后发送事件。为此，我建议您这样做：

使用 Cloud Scheduler 安排作业（例如每 10 分钟）
该作业在 BigQuery 中执行请求并检查自上次培训作业以来的行数（上次培训作业的日期必须在某个地方，我建议在另一个 BigQuery 表中）
- 如果行数> 1000，则触发您正在运行的作业
- 否则，退出函数

如您所见，这并不容易，并且有几个警告：

部署模型时，还必须写下最近一次训练的日期
您必须对 BigQuery 执行多次请求。正确分区表以限制成本

这对你有意义吗？

编辑

gcloud 命令是 API 调用的“简单”包装器。尝试将参数添加--http-log到您的 gcloud 命令中，以查看调用了哪个 API 以及使用了哪些参数。

无论如何，您可以通过调用此API开始工作，如果您愿意，可以使用--http-loggcloud SDK 的参数！

score 4 · Accepted Answer

For anyone looking for solution to submit training job on schedule,Here I am posting my solution after trying few ways.I tried,

Run through cloud composer using Airflow
Start job using start script
Use cron with Cloud scheduler,Pub/Sub and Cloud function

Easiest and most cost effective way is using cloud scheduler and AI-platform client library with cloud function

step 1 - create pub/sub topic (example start-training)

step 2 - create cron using cloud scheduler targeting start-training topic

step 3 - create cloud function using trigger type as cloud pub/sub and topic as start-training and entry point is submit_job function.This function submit a training job to AI-platform through python client library.

Now we have this beautiful DAG

Scheduler -> Pub/Sub -> Cloud Function -> AI-platform

cloud function code goes like this

main.py

import datetime
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials

id = '<PROJECT ID>'
bucket_name = "<BUCKET NAME>"
project_id = 'projects/{}'.format(id)
job_name = "training_" + datetime.datetime.now().strftime("%y%m%d_%H%M%S")

def submit_job(event, context):

     training_inputs = {
     'scaleTier': 'BASIC',
     'packageUris': [f"gs://{bucket_name}/package/trainer-0.1.tar.gz"],
     'pythonModule': 'trainer.task',
     'region': 'asia-northeast1',
     'jobDir': f"gs://{bucket_name}",
     'runtimeVersion': '2.2',
     'pythonVersion': '3.7',
          }

     job_spec = {"jobId":job_name, "trainingInput": training_inputs}
     cloudml = discovery.build("ml" , "v1" ,cache_discovery=False)
     request = cloudml.projects().jobs().create(body=job_spec,parent=project_id)
     response = request.execute()

requirement.txt

google-api-python-client
oauth2client

Important

make sure to use Project_id not Project_name,otherwise it will give permission error
If you get ImportError:file_cache is unavailable when using oauthclient .... error use cache_discovery=False in build function,otherwise leave function to use cache for performance reason.
point to correct GCS location to your source package,in this case my package name is trainer built and located in package folder in the bucket and main module is task

google-cloud-platform - 如何自动启动 AI 平台作业？

2 回答 2

Related

Reference