0

我开发了一个带有 AutoML 步骤的管道,并使用生成的工件来注册模型。该工件是一个序列化模型,是一个大的单个文件:model_data。我使用pickle.load函数在评分文件的Init函数中反序列化模型,但在部署过程中失败。当我在主笔记本中解开模型时,它工作正常。这让我发疯。请帮助,伙计们!

AutoML-Pipeline.ipynb

automl_settings = {
    "experiment_timeout_minutes": 30,
    "primary_metric": 'AUC_weighted',
    "max_concurrent_iterations": 3, 
    "max_cores_per_iteration": -1,
    "enable_dnn": True,
    "enable_early_stopping": True,
    "validation_size": 0.3,
    "verbosity": logging.INFO,
    "enable_voting_ensemble": False,
    "enable_stack_ensemble": False,
}

automl_config = AutoMLConfig(task = 'classification',
                             debug_log = 'automl_errors.log',
                             path = ".",
                             compute_target=compute_target,
                             training_data = train_ds,
                             label_column_name = target_column_name,
                             **automl_settings
                            )

metrics_output_name = 'metrics_output'
best_model_output_name = 'best_model_output'

metrics_data = PipelineData(name='metrics_data',
                           datastore=dstor,
                           pipeline_output_name=metrics_output_name,
                           training_output=TrainingOutput(type='Metrics'))
model_data = PipelineData(name='model_data',
                           datastore=dstor,
                           pipeline_output_name=best_model_output_name,
                           training_output=TrainingOutput(type='Model'))

automl_step = AutoMLStep(
    name='automl_module',
    automl_config=automl_config,
    outputs=[metrics_data, model_data],
    allow_reuse=False)

score_file_v_1_0_0.py

def init():
    global model
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'model_data')
    try:
        with open(model_path, "rb" ) as f:
            model = pickle.load(f)
    except Exception as e:
        logging_utilities.log_traceback(e, logger)
        raise

AutoML-Pipeline.ipynb

model = Model(ws, 'AutoML-Product')
automl_env = Environment.from_conda_specification(name = 'automl_env', file_path = 'conda_env_v_1_0_0.yml')
inference_config=InferenceConfig(entry_script="scoring_file_v_1_0_0.py",environment=automl_env)
aciconfig=AciWebservice.deploy_configuration(cpu_cores=1, 
                                               memory_gb=1, 
                                               tags={'type': "automl_product"}, 
                                               description='Product Classification')
aci_service=Model.deploy(ws, "automl-product-classification", [model], inference_config, aciconfig)
aci_service.wait_for_deployment(True)

错误:

WebserviceException: WebserviceException: Message: Service deployment polling reached non-successful terminal state, current service state. 
Error:
{
  "code": "AciDeploymentFailed",
  "statusCode": 400,
  "message": "Aci Deployment failed with exception: Your scoring file's init() function restarts frequently. You can address the error by increasing the value of memory_gb in deployment_config.",
  "details": [
    {
      "code": "ScoreInitRestart",
      "message": "Your scoring file's init() function restarts frequently. You can address the error by increasing the value of memory_gb in deployment_config."
    }
  ]
}

在笔记本中成功运行:AutoML-Pipeline.ipynb

import pickle
path=Model.get_model_path('AutoML-Product',None,ws)
with open(path, "rb" ) as f:
    best_model = pickle.load(f)
best_model
>>>>>
PipelineWithYTransformations(Pipeline={'memory': None,
                                       'steps': [('datatransformer',
                                                  DataTransformer(enable_dnn=True, enable_feature_sweeping=True, feature_sweeping_config={}, feature_sweeping_timeout=86400, featurization_config=None, force_text_dnn=False, is_cross_validation=False, is_onnx_compatible=False, observer=None, task='classification', working_dir='/mn...
    with_std=True
)),
                                                 ('LogisticRegression',
                                                  LogisticRegression(C=0.02811768697974228,
                                                                     class_weight='balanced',
                                                                     dual=False,
                                                                     fit_intercept=True,
                                                                     intercept_scaling=1,
                                                                     l1_ratio=None,
                                                                     max_iter=100,
                                                                     multi_class='ovr',
                                                                     n_jobs=-1,
                                                                     penalty='l2',
                                                                     random_state=None,
                                                                     solver='saga',
                                                                     tol=0.0001,
                                                                     verbose=0,
                                                                     warm_start=False))],
                                       'verbose': False},
                             y_transformer={},
                             y_transformer_name='LabelEncoder')
4

1 回答 1

0

通过阅读错误消息,它表明您需要增加为部署分配的内存

aciconfig=AciWebservice.deploy_configuration(...,memory_gb=1,...)

允许您的模型运行的数量和x数量:

aciconfig=AciWebservice.deploy_configuration(...,memory_gb=x,...)

您的评分文件的 init() 函数会频繁重启。您可以通过增加 deployment_config 中 memory_gb 的值来解决该错误

有迹象表明问题出在 pickle.load() 方法上

于 2022-01-07T10:38:16.327 回答