我有一个保存在 Cloud Storage 中的 scikit-learn 模型,我正在尝试使用 AI Platform Prediction 进行部署。当我将此模型部署到区域端点时,部署成功完成:
➜ gcloud ai-platform versions describe regional_endpoint_version --model=regional --region us-central1
Using endpoint [https://us-central1-ml.googleapis.com/]
autoScaling:
minNodes: 1
createTime: '2020-12-30T15:21:55Z'
deploymentUri: <REMOVED>
description: testing deployment to a regional endpoint
etag: <REMOVED>
framework: SCIKIT_LEARN
isDefault: true
machineType: n1-standard-4
name: <REMOVED>
pythonVersion: '3.7'
runtimeVersion: '2.2'
state: READY
但是,当我尝试使用相同的 Python/运行时版本将完全相同的模型部署到全局端点时,部署失败,说加载模型时出错:
(aiz) ➜ stanford_nlp_a3 gcloud ai-platform versions describe public_object --model=global
Using endpoint [https://ml.googleapis.com/]
autoScaling: {}
createTime: '2020-12-30T15:12:11Z'
deploymentUri: <REMOVED>
description: testing global endpoint deployment
errorMessage: 'Create Version failed. Bad model detected with error: "Error loading
the model"'
etag: <REMOVED>
framework: SCIKIT_LEARN
machineType: mls1-c1-m2
name: <REMOVED>
pythonVersion: '3.7'
runtimeVersion: '2.2'
state: FAILED
我尝试将 .joblib 对象公开,以确保在尝试部署到导致问题的两个端点时没有权限差异,但部署到全局端点仍然失败。我从帖子中删除了 deploymentUri,因为我一直在试验这个模型对象的权限,但是在两个不同的模型版本中路径是相同的。
两种部署的机器类型必须不同,对于区域部署,我使用 min nodes = 1,而对于全局,我可以使用 min nodes = 0,但除此之外和 etags 其他一切都完全相同。
我在 AI Platform Prediction区域端点文档页面中找不到任何信息,表明某些模型只能部署到某种类型的端点。“加载模型时出错”错误消息并没有给我太多继续,因为它似乎不是模型文件的权限问题。
当我将 --log-http 选项添加到 create version 命令时,我看到错误代码为 3,但该消息没有显示任何其他信息:
➜ ~ gcloud ai-platform versions create $VERSION_NAME \
--model=$MODEL_NAME \
--origin=$MODEL_DIR \
--runtime-version=2.2 \
--framework=$FRAMEWORK \
--python-version=3.7 \
--machine-type=mls1-c1-m2 --log-http
Using endpoint [https://ml.googleapis.com/]
=======================
==== request start ====
...
...
the final response from the server looks like this:
---- response start ----
status: 200
-- headers start --
<headers>
-- headers end --
-- body start --
{
"name": "<name>",
"metadata": {
"@type": "type.googleapis.com/google.cloud.ml.v1.OperationMetadata",
"createTime": "2020-12-30T22:53:30Z",
"startTime": "2020-12-30T22:53:30Z",
"endTime": "2020-12-30T22:54:37Z",
"operationType": "CREATE_VERSION",
"modelName": "<name>",
"version": {
<version info>
}
},
"done": true,
"error": {
"code": 3,
"message": "Create Version failed. Bad model detected with error: \"Error loading the model\""
}
}
-- body end --
total round trip time (request+response): 0.096 secs
---- response end ----
----------------------
Creating version (this might take a few minutes)......failed.
ERROR: (gcloud.ai-platform.versions.create) Create Version failed. Bad model detected with error: "Error loading the model"
谁能解释我在这里缺少什么?