0

我正在使用 Vertex AI 进行 AutoML 视频分类,我想获取一些在 Web UI(云控制台)中看到的数据(模型/数据集详细信息)。我正在使用 AI 平台 Python SDK 或 REST API。

例如,模型 API 返回“训练视频”但不返回测试视频(网络模型详细信息,选项卡 EVALUATE)

Vertex AI 模型评估

然后例如在网络上的选项卡模型属性中,我无法获得训练时间、项目总数、算法、目标、项目总数

顶点 AI 模型属性

对于数据集详细信息,我想获取已标记/未标记视频、标签和对应编号的数量

数据集详细信息、标签

这是我用来获取数据的代码(作为 Vertex AI Pipeline 中的组件):

def get_metadata(project_id, region, model_id):
    import requests

    import google.auth
    import google.cloud.aiplatform as aip
    from google.cloud import aiplatform_v1
    from google.protobuf import json_format
    from google.auth.transport import requests as grequests

    aip.init(project=project_id, location=region)
    API_ENDPOINT = "{}-aiplatform.googleapis.com".format(region)
    model = aip.Model(model_id)
    model_dict = model.to_dict()
    model_metadata = model_dict['metadata']

    model_name = model_dict['displayName']
    model_creation_date = model_dict['createTime']
    model_type = model_metadata['modelType']
    number_training = model_metadata['trainingDataItemsCount']

    client_options = {
        "api_endpoint": API_ENDPOINT
    }
    model_path = model.resource_name
    client_model = aiplatform_v1.services.model_service.ModelServiceClient(client_options=client_options)
    list_eval_request = aiplatform_v1.types.ListModelEvaluationsRequest(parent=model_path)
    list_eval = client_model.list_model_evaluations(request=list_eval_request)

    eval_name = ''
    for val in list_eval:
        eval_name = val.name
    get_eval_request = aiplatform_v1.types.GetModelEvaluationRequest(name=eval_name)
    model_eval = client_model.get_model_evaluation(request=get_eval_request)
    model_eval_data = json_format.MessageToDict(model_eval._pb)

    model_metrics = model_eval_data['metrics']
    average_precision = model_metrics.get('auPrc')
    confidence_metrics = model_metrics['confidenceMetrics']
    confidence_threshold = -1
    f1_score = -1
    precision = -1
    recall = -1

    for item in confidence_metrics:
        confidence_threshold_temp = item['confidenceThreshold']
        if confidence_threshold_temp >= 0.5:
            confidence_threshold = confidence_threshold_temp
            f1_score = item['f1Score']
            precision = item['precision']
            recall = item['recall']
            break
    # auc_precision = precision
    # auc_recall = recall

    credentials, _ = google.auth.default()
    r = grequests.Request()
    credentials.refresh(r)
    training_pipeline_resource_name = model_dict['trainingPipeline']

    training_pipeline_url = f'https://{API_ENDPOINT}/v1beta1/{training_pipeline_resource_name}'
    headers = {
        'Authorization': f'Bearer {credentials.token}'
    }
    r = requests.get(training_pipeline_url, headers=headers)
    training_pipeline_detail = r.json()
    input_data_config = training_pipeline_detail.get('inputDataConfig', {})
    dataset_id = input_data_config.get('datasetId', '')
    fraction_split = input_data_config.get('fractionSplit', {})
    test_fraction = fraction_split.get('testFraction')
    training_fraction = fraction_split.get('trainingFraction')
    data_split = f'{training_fraction}/{test_fraction}'

    dataset = aip.VideoDataset(dataset_id)
    dataset_resource = json_format.MessageToDict(dataset.gca_resource._pb)
    dataset_name = dataset_resource.get('displayName')
    dataset_creation_date = dataset_resource.get('createTime')
    labels = dataset_resource['labels']
    dataset_type = labels.get('aiplatform.googleapis.com/dataset_metadata_schema')

    data = {
        'model_id': model_id,
        'model_name': model_name,
        'model_creation_date': model_creation_date,
        'model_type': model_type,
        'number_training': number_training,
        'average_precision': average_precision,
        'precision': precision,
        'recall': recall,
        'data_split': data_split,
        'dataset_name': dataset_name,
        'dataset_type': dataset_type,
        'dataset_id': dataset_id,
        'dataset_creation_date': dataset_creation_date,        
    }

另外例如,我发现在创建数据集时的训练工作中,通过 WebUI 训练模型我可以获得数据拆分(训练/测试比率),但是当我在 Vertex AI Pipelines 中执行此操作时,我没有明确设置AutoMLVideoTrainingJobRunOp 的数据拆分,我无法从训练作业详细信息中拆分数据,因此它似乎仅在明确设置时才保存。

我注意到的另一件事是,当对 Cloud Console(检查 Chrome 开发工具)发出 API 请求时,它会返回更多(更丰富)的数据(项目),然后是公共 Vertex AI API。

我不确定这是暂时的还是有意/永久的行为。

我将不胜感激想法/评论/帮助。

4

0 回答 0