我想设置一个带有外部指标和工件存储的跟踪 MLFlow 服务器。我在 docker 网络中有以下 docker 容器:mlflow-server、postgres、sftp-mlflow 和 python-client。我能够设置 postgres 并将其连接到 mlflow-server 和客户端:
mlflow server --backend-store-uri postgresql://postgres:<pass>@mlflow_db:5432/mlflow_db --default-artifact-root sftp://sftp:<pass>@sftp-mlflow:22 -h 0.0.0.0 -p 8000
但是,我对工件存储无能为力。尝试了以下 sftp 图像
也遵循了这个指南。但是工件存储仍然不起作用=(
在我的客户端,我有
import mlflow
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
remote_server_uri = "http://mlflow-server:8000" # set to your server URI
mlflow.set_tracking_uri(remote_server_uri)
# plotting
fig.savefig("test.png")
ARTIFACT_URI = "sftp://sftp:<pass>@sftp-mlflow:22"
EXPERIMENT_NAME = "test"
mlflow.create_experiment(EXPERIMENT_NAME, artifact_location=ARTIFACT_URI)
mlflow.set_experiment(EXPERIMENT_NAME)
with mlflow.start_run():
mlflow.log_param("a", 1)
mlflow.log_metric("b", 2)
mlflow.log_artifact('test.png')
运行此代码时,我得到:
2020/08/06 01:05:19 ERROR mlflow.utils.rest_utils: API request to http://mlflow-server:8000/api/2.0/mlflow/experiments/create failed with code 500 != 200, retrying up to 0 more times. API response body: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>
Traceback (most recent call last):
File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/run.py", line 24, in <module>
mlflow.create_experiment(EXPERIMENT_NAME, artifact_location=ARTIFACT_URI)
File "/usr/local/lib/python3.8/site-packages/mlflow/tracking/fluent.py", line 357, in create_experiment
return MlflowClient().create_experiment(name, artifact_location)
File "/usr/local/lib/python3.8/site-packages/mlflow/tracking/client.py", line 164, in create_experiment
return self._tracking_client.create_experiment(name, artifact_location)
File "/usr/local/lib/python3.8/site-packages/mlflow/tracking/_tracking_service/client.py", line 126, in create_experiment
return self.store.create_experiment(
File "/usr/local/lib/python3.8/site-packages/mlflow/store/tracking/rest_store.py", line 54, in create_experiment
response_proto = self._call_endpoint(CreateExperiment, req_body)
File "/usr/local/lib/python3.8/site-packages/mlflow/store/tracking/rest_store.py", line 32, in _call_endpoint
return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
File "/usr/local/lib/python3.8/site-packages/mlflow/utils/rest_utils.py", line 142, in call_endpoint
response = http_request(
File "/usr/local/lib/python3.8/site-packages/mlflow/utils/rest_utils.py", line 86, in http_request
raise MlflowException("API request to %s failed to return code 200 after %s tries" %
mlflow.exceptions.MlflowException: API request to http://mlflow-server:8000/api/2.0/mlflow/experiments/create failed to return code 200 after 3 tries
我可以使用 sftp 从 mlflow-server 容器和 python 客户端连接到 sftp 存储:
sftp -P 22 sftp@sftp-mlflow