python - 如何在 Azure ML 工作区的一个 pickle 文件中转储和利用多个 ML 算法对象？

Question

我正在尝试使用 Jupyter 笔记本在 Azure ML 工作区中创建 ML 模型。我没有使用 Azure 提供的 AutoML 功能或设计器，并且希望运行本地准备的完整代码。

我的 ML 模型中使用了 3 种不同的算法。我很困惑如何将所有对象保存在一个泡菜文件中，以便稍后在“推理配置”和“Score.py”文件中使用？此外，一旦保存，我如何在“Score.py”文件（这是包含驱动程序代码的主文件）中访问它们？

目前我正在使用以下方法：

import pickle
f= 'prediction.pkl'
all_models=[Error_Message_countvector, ErrorMessage_tfidf_fit, model_naive]
with open(f, 'wb') as files:
    pickle.dump(all_models, files)

并访问对象：

cv_output = loaded_model[0].transform(input_series)
tfidf_output = loaded_model[1].transform(cv_output)
loaded_model_prediction = loaded_model[2].predict(tfidf_output)

不知何故，当我在与整个代码相同的单元格中运行时，此方法可以正常工作。但是当我部署完整的模型时它会抛出错误。

我的“Score.py”文件看起来像这样：

import json
from azureml.core.model import Model
import joblib
import pandas as pd

def init():
    global prediction_model 
    prediction_model_path = Model.get_model_path("prediction")    
    prediction_model = joblib.load(prediction_model_path)     

def run(data):
    try:
        data = json.loads(data)     
        input_string= str(data['errorMsg']).strip()             
        input_series=pd.Series(input_string)            
        cv_output= prediction_model[0].transform(input_series)
        tfidf_output = prediction_model[1].transform(cv_output) 
        result = prediction_model[2].predict(tfidf_output)           
        return {'response' : result }

    except Exception as e:
        error = str(e)
        return {'response' : error }

部署时收到的错误是：

Error:
{
  "code": "AciDeploymentFailed",
  "statusCode": 400,
  "message": "Aci Deployment failed with exception: Error in entry script, AttributeError: module '__main__' has no attribute 'text_cleaning', please run print(service.get_logs()) to get details.",
  "details": [
    {
      "code": "CrashLoopBackOff",
      "message": "Error in entry script, AttributeError: module '__main__' has no attribute 'text_cleaning', please run print(service.get_logs()) to get details."
    }
  ]
}

任何人都可以帮助我理解问题或找出代码中是否缺少/错误？

将多个算法对象保存在一个泡菜文件中的正确方法是什么？

score 0 · Accepted Answer

> 谁能帮我理解问题或找出代码中是否有遗漏/错误？

从您的错误消息中：

“入口脚本中的错误，AttributeError：模块' main '没有属性'text_cleaning' ......

您的第一个似乎是试图调用一个尚未由您的评分脚本导入cv_output的函数。prediction_modeltext_cleaning

> 将多个算法对象保存在一个 pickle 文件中的正确方法是什么？

如果您想保留一系列转换，例如您的示例中的转换，最佳实践是使用Pipeline来自的类sklearn：

https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html

python - 如何在 Azure ML 工作区的一个 pickle 文件中转储和利用多个 ML 算法对象？

1 回答 1

Related

Reference