2

我创建了一个训练作业,我从大查询中获取我的数据,执行训练和部署模型。我想在这两种情况下自动开始训练:

  1. 超过 1000 个新行添加到数据集中
  2. 有时间表(例如,每周一次)

我检查了 GCP Cloud Scheduler,但它似乎不适合我的情况。

4

2 回答 2

4

Cloud Scheduler 是按计划触发培训的正确工具。我不知道你的拦截器是什么!!

对于你的第一点,你不能。您不能放置触发器(在 BigQuery 或其他数据库上)以在 X 新行之后发送事件。为此,我建议您这样做:

  • 使用 Cloud Scheduler 安排作业(例如每 10 分钟)
  • 该作业在 BigQuery 中执行请求并检查自上次培训作业以来的行数(上次培训作业的日期必须在某个地方,我建议在另一个 BigQuery 表中)
    • 如果行数> 1000,则触发您正在运行的作业
    • 否则,退出函数

如您所见,这并不容易,并且有几个警告:

  • 部署模型时,还必须写下最近一次训练的日期
  • 您必须对 BigQuery 执行多次请求。正确分区表以限制成本

这对你有意义吗?

编辑

gcloud 命令是 API 调用的“简单”包装器。尝试将参数添加--http-log到您的 gcloud 命令中,以查看调用了哪个 API 以及使用了哪些参数。

无论如何,您可以通过调用此API开始工作,如果您愿意,可以使用--http-loggcloud SDK 的参数!

于 2020-06-27T18:44:21.320 回答
4

For anyone looking for solution to submit training job on schedule,Here I am posting my solution after trying few ways.I tried,

  • Run through cloud composer using Airflow
  • Start job using start script
  • Use cron with Cloud scheduler,Pub/Sub and Cloud function

Easiest and most cost effective way is using cloud scheduler and AI-platform client library with cloud function

step 1 - create pub/sub topic (example start-training)

step 2 - create cron using cloud scheduler targeting start-training topic

enter image description here

step 3 - create cloud function using trigger type as cloud pub/sub and topic as start-training and entry point is submit_job function.This function submit a training job to AI-platform through python client library.

Now we have this beautiful DAG

Scheduler -> Pub/Sub -> Cloud Function -> AI-platform

cloud function code goes like this

main.py

import datetime
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials

id = '<PROJECT ID>'
bucket_name = "<BUCKET NAME>"
project_id = 'projects/{}'.format(id)
job_name = "training_" + datetime.datetime.now().strftime("%y%m%d_%H%M%S")

def submit_job(event, context):

     training_inputs = {
     'scaleTier': 'BASIC',
     'packageUris': [f"gs://{bucket_name}/package/trainer-0.1.tar.gz"],
     'pythonModule': 'trainer.task',
     'region': 'asia-northeast1',
     'jobDir': f"gs://{bucket_name}",
     'runtimeVersion': '2.2',
     'pythonVersion': '3.7',
          }

     job_spec = {"jobId":job_name, "trainingInput": training_inputs}
     cloudml = discovery.build("ml" , "v1" ,cache_discovery=False)
     request = cloudml.projects().jobs().create(body=job_spec,parent=project_id)
     response = request.execute()

requirement.txt

google-api-python-client
oauth2client

Important

  • make sure to use Project_id not Project_name,otherwise it will give permission error

  • If you get ImportError:file_cache is unavailable when using oauthclient .... error use cache_discovery=False in build function,otherwise leave function to use cache for performance reason.

  • point to correct GCS location to your source package,in this case my package name is trainer built and located in package folder in the bucket and main module is task

于 2020-11-06T05:51:35.450 回答