我正在使用 Vertex AI 进行 AutoML 视频分类,我想获取一些在 Web UI(云控制台)中看到的数据(模型/数据集详细信息)。我正在使用 AI 平台 Python SDK 或 REST API。
例如,模型 API 返回“训练视频”但不返回测试视频(网络模型详细信息,选项卡 EVALUATE)
然后例如在网络上的选项卡模型属性中,我无法获得训练时间、项目总数、算法、目标、项目总数
对于数据集详细信息,我想获取已标记/未标记视频、标签和对应编号的数量
这是我用来获取数据的代码(作为 Vertex AI Pipeline 中的组件):
def get_metadata(project_id, region, model_id):
import requests
import google.auth
import google.cloud.aiplatform as aip
from google.cloud import aiplatform_v1
from google.protobuf import json_format
from google.auth.transport import requests as grequests
aip.init(project=project_id, location=region)
API_ENDPOINT = "{}-aiplatform.googleapis.com".format(region)
model = aip.Model(model_id)
model_dict = model.to_dict()
model_metadata = model_dict['metadata']
model_name = model_dict['displayName']
model_creation_date = model_dict['createTime']
model_type = model_metadata['modelType']
number_training = model_metadata['trainingDataItemsCount']
client_options = {
"api_endpoint": API_ENDPOINT
}
model_path = model.resource_name
client_model = aiplatform_v1.services.model_service.ModelServiceClient(client_options=client_options)
list_eval_request = aiplatform_v1.types.ListModelEvaluationsRequest(parent=model_path)
list_eval = client_model.list_model_evaluations(request=list_eval_request)
eval_name = ''
for val in list_eval:
eval_name = val.name
get_eval_request = aiplatform_v1.types.GetModelEvaluationRequest(name=eval_name)
model_eval = client_model.get_model_evaluation(request=get_eval_request)
model_eval_data = json_format.MessageToDict(model_eval._pb)
model_metrics = model_eval_data['metrics']
average_precision = model_metrics.get('auPrc')
confidence_metrics = model_metrics['confidenceMetrics']
confidence_threshold = -1
f1_score = -1
precision = -1
recall = -1
for item in confidence_metrics:
confidence_threshold_temp = item['confidenceThreshold']
if confidence_threshold_temp >= 0.5:
confidence_threshold = confidence_threshold_temp
f1_score = item['f1Score']
precision = item['precision']
recall = item['recall']
break
# auc_precision = precision
# auc_recall = recall
credentials, _ = google.auth.default()
r = grequests.Request()
credentials.refresh(r)
training_pipeline_resource_name = model_dict['trainingPipeline']
training_pipeline_url = f'https://{API_ENDPOINT}/v1beta1/{training_pipeline_resource_name}'
headers = {
'Authorization': f'Bearer {credentials.token}'
}
r = requests.get(training_pipeline_url, headers=headers)
training_pipeline_detail = r.json()
input_data_config = training_pipeline_detail.get('inputDataConfig', {})
dataset_id = input_data_config.get('datasetId', '')
fraction_split = input_data_config.get('fractionSplit', {})
test_fraction = fraction_split.get('testFraction')
training_fraction = fraction_split.get('trainingFraction')
data_split = f'{training_fraction}/{test_fraction}'
dataset = aip.VideoDataset(dataset_id)
dataset_resource = json_format.MessageToDict(dataset.gca_resource._pb)
dataset_name = dataset_resource.get('displayName')
dataset_creation_date = dataset_resource.get('createTime')
labels = dataset_resource['labels']
dataset_type = labels.get('aiplatform.googleapis.com/dataset_metadata_schema')
data = {
'model_id': model_id,
'model_name': model_name,
'model_creation_date': model_creation_date,
'model_type': model_type,
'number_training': number_training,
'average_precision': average_precision,
'precision': precision,
'recall': recall,
'data_split': data_split,
'dataset_name': dataset_name,
'dataset_type': dataset_type,
'dataset_id': dataset_id,
'dataset_creation_date': dataset_creation_date,
}
另外例如,我发现在创建数据集时的训练工作中,通过 WebUI 训练模型我可以获得数据拆分(训练/测试比率),但是当我在 Vertex AI Pipelines 中执行此操作时,我没有明确设置AutoMLVideoTrainingJobRunOp 的数据拆分,我无法从训练作业详细信息中拆分数据,因此它似乎仅在明确设置时才保存。
我注意到的另一件事是,当对 Cloud Console(检查 Chrome 开发工具)发出 API 请求时,它会返回更多(更丰富)的数据(项目),然后是公共 Vertex AI API。
我不确定这是暂时的还是有意/永久的行为。
我将不胜感激想法/评论/帮助。