我创建了一个训练作业,我从大查询中获取我的数据,执行训练和部署模型。我想在这两种情况下自动开始训练:
- 超过 1000 个新行添加到数据集中
- 有时间表(例如,每周一次)
我检查了 GCP Cloud Scheduler,但它似乎不适合我的情况。
我创建了一个训练作业,我从大查询中获取我的数据,执行训练和部署模型。我想在这两种情况下自动开始训练:
我检查了 GCP Cloud Scheduler,但它似乎不适合我的情况。
Cloud Scheduler 是按计划触发培训的正确工具。我不知道你的拦截器是什么!!
对于你的第一点,你不能。您不能放置触发器(在 BigQuery 或其他数据库上)以在 X 新行之后发送事件。为此,我建议您这样做:
如您所见,这并不容易,并且有几个警告:
这对你有意义吗?
编辑
gcloud 命令是 API 调用的“简单”包装器。尝试将参数添加--http-log
到您的 gcloud 命令中,以查看调用了哪个 API 以及使用了哪些参数。
无论如何,您可以通过调用此API开始工作,如果您愿意,可以使用--http-log
gcloud SDK 的参数!
For anyone looking for solution to submit training job on schedule,Here I am posting my solution after trying few ways.I tried,
Easiest and most cost effective way is using cloud scheduler and AI-platform client library with cloud function
step 1 - create pub/sub topic (example start-training
)
step 2 - create cron using cloud scheduler targeting start-training
topic
step 3 - create cloud function using trigger type as cloud pub/sub
and topic as start-training
and entry point is submit_job
function.This function submit a training job to AI-platform through python client library.
Now we have this beautiful DAG
Scheduler -> Pub/Sub -> Cloud Function -> AI-platform
cloud function code goes like this
main.py
import datetime
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
id = '<PROJECT ID>'
bucket_name = "<BUCKET NAME>"
project_id = 'projects/{}'.format(id)
job_name = "training_" + datetime.datetime.now().strftime("%y%m%d_%H%M%S")
def submit_job(event, context):
training_inputs = {
'scaleTier': 'BASIC',
'packageUris': [f"gs://{bucket_name}/package/trainer-0.1.tar.gz"],
'pythonModule': 'trainer.task',
'region': 'asia-northeast1',
'jobDir': f"gs://{bucket_name}",
'runtimeVersion': '2.2',
'pythonVersion': '3.7',
}
job_spec = {"jobId":job_name, "trainingInput": training_inputs}
cloudml = discovery.build("ml" , "v1" ,cache_discovery=False)
request = cloudml.projects().jobs().create(body=job_spec,parent=project_id)
response = request.execute()
requirement.txt
google-api-python-client
oauth2client
Important
make sure to use Project_id not Project_name,otherwise it will give permission error
If you get ImportError:file_cache is unavailable when using oauthclient ....
error use cache_discovery=False
in build function,otherwise leave function to use cache for performance reason.
point to correct GCS location to your source package,in this case my package name is trainer
built and located in package
folder in the bucket and main module is task