我已经在本地机器上使用 microk8s 和 Kubeflow 建立了一个本地集群。我按照这些安装说明来启动并运行我的集群。我已经启动了一个 Jupyter 服务器并编写了一个 Kubeflow 管道。
我用来定义组件的 YAML 文件如下所示:
name: beat_the_market - Preprocess
description: Preprocesses market data and loads into GCS bucket.
inputs:
- {name: project, type: String, description: GCP Project ID}
- {name: bucket, type: GCSPath, description: GCS bucket path}
- {name: ticker, type: String, description: Ticker symbol for selected stock}
outputs:
- {name: Trained model, type: Tensorflow model}
implementation:
container:
image: us.gcr.io/manceps-labs/beat_the_market:latest
command: [python3, /opt/preprocess.py,
--project, {inputValue: project},
--bucket, {inputValue: bucket},
--ticker, {inputValue: ticker}
]
不幸的是,当我尝试使用 Kubeflow Pipelines SDK 创建实验时,出现以下错误:
2020-04-15 23:03:25,135 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1cc8a4c358>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /apis/v1beta1/experiments
2020-04-15 23:03:25,135 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1cc8a4c358>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /apis/v1beta1/experiments
WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1cc8a4c358>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /apis/v1beta1/experiments
---------------------------------------------------------------------------
gaierror Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/urllib3/connection.py in _new_conn(self)
158 conn = connection.create_connection(
--> 159 (self._dns_host, self.port), self.timeout, **extra_kw)
160
/usr/local/lib/python3.6/dist-packages/urllib3/util/connection.py in create_connection(address, timeout, source_address, socket_options)
56
---> 57 for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
58 af, socktype, proto, canonname, sa = res
/usr/lib/python3.6/socket.py in getaddrinfo(host, port, family, type, proto, flags)
744 addrlist = []
--> 745 for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
746 af, socktype, proto, canonname, sa = res
gaierror: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
NewConnectionError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
599 body=body, headers=headers,
--> 600 chunked=chunked)
601
/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
353 else:
--> 354 conn.request(method, url, **httplib_request_kw)
355
/usr/lib/python3.6/http/client.py in request(self, method, url, body, headers, encode_chunked)
1238 """Send a complete request to the server."""
-> 1239 self._send_request(method, url, body, headers, encode_chunked)
1240
/usr/lib/python3.6/http/client.py in _send_request(self, method, url, body, headers, encode_chunked)
1284 body = _encode(body, 'body')
-> 1285 self.endheaders(body, encode_chunked=encode_chunked)
1286
/usr/lib/python3.6/http/client.py in endheaders(self, message_body, encode_chunked)
1233 raise CannotSendHeader()
-> 1234 self._send_output(message_body, encode_chunked=encode_chunked)
1235
/usr/lib/python3.6/http/client.py in _send_output(self, message_body, encode_chunked)
1025 del self._buffer[:]
-> 1026 self.send(msg)
1027
/usr/lib/python3.6/http/client.py in send(self, data)
963 if self.auto_open:
--> 964 self.connect()
965 else:
/usr/local/lib/python3.6/dist-packages/urllib3/connection.py in connect(self)
180 def connect(self):
--> 181 conn = self._new_conn()
182 self._prepare_conn(conn)
/usr/local/lib/python3.6/dist-packages/urllib3/connection.py in _new_conn(self)
167 raise NewConnectionError(
--> 168 self, "Failed to establish a new connection: %s" % e)
169
NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f1cc8b3e860>: Failed to establish a new connection: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
MaxRetryError Traceback (most recent call last)
<ipython-input-325-c8d6a70afd2d> in <module>
9 try:
---> 10 experiment = client.get_experiment(experiment_name=experiment_name)
11 except:
/usr/local/lib/python3.6/dist-packages/kfp/_client.py in get_experiment(self, experiment_id, experiment_name)
213 while next_page_token is not None:
--> 214 list_experiments_response = self.list_experiments(page_size=100, page_token=next_page_token)
215 next_page_token = list_experiments_response.next_page_token
/usr/local/lib/python3.6/dist-packages/kfp/_client.py in list_experiments(self, page_token, page_size, sort_by)
193 response = self._experiment_api.list_experiment(
--> 194 page_token=page_token, page_size=page_size, sort_by=sort_by)
195 return response
/usr/local/lib/python3.6/dist-packages/kfp_server_api/api/experiment_service_api.py in list_experiment(self, **kwargs)
347 else:
--> 348 (data) = self.list_experiment_with_http_info(**kwargs) # noqa: E501
349 return data
/usr/local/lib/python3.6/dist-packages/kfp_server_api/api/experiment_service_api.py in list_experiment_with_http_info(self, **kwargs)
429 _request_timeout=params.get('_request_timeout'),
--> 430 collection_formats=collection_formats)
/usr/local/lib/python3.6/dist-packages/kfp_server_api/api_client.py in call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, async_req, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
329 _return_http_data_only, collection_formats,
--> 330 _preload_content, _request_timeout)
331 else:
/usr/local/lib/python3.6/dist-packages/kfp_server_api/api_client.py in __call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
160 _preload_content=_preload_content,
--> 161 _request_timeout=_request_timeout)
162
/usr/local/lib/python3.6/dist-packages/kfp_server_api/api_client.py in request(self, method, url, query_params, headers, post_params, body, _preload_content, _request_timeout)
350 _request_timeout=_request_timeout,
--> 351 headers=headers)
352 elif method == "HEAD":
/usr/local/lib/python3.6/dist-packages/kfp_server_api/rest.py in GET(self, url, headers, query_params, _preload_content, _request_timeout)
237 _request_timeout=_request_timeout,
--> 238 query_params=query_params)
239
/usr/local/lib/python3.6/dist-packages/kfp_server_api/rest.py in request(self, method, url, query_params, headers, body, post_params, _preload_content, _request_timeout)
210 timeout=timeout,
--> 211 headers=headers)
212 except urllib3.exceptions.SSLError as e:
/usr/local/lib/python3.6/dist-packages/urllib3/request.py in request(self, method, url, fields, headers, **urlopen_kw)
67 headers=headers,
---> 68 **urlopen_kw)
69 else:
/usr/local/lib/python3.6/dist-packages/urllib3/request.py in request_encode_url(self, method, url, fields, headers, **urlopen_kw)
88
---> 89 return self.urlopen(method, url, **extra_kw)
90
/usr/local/lib/python3.6/dist-packages/urllib3/poolmanager.py in urlopen(self, method, url, redirect, **kw)
323 else:
--> 324 response = conn.urlopen(method, u.request_uri, **kw)
325
/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
666 release_conn=release_conn, body_pos=body_pos,
--> 667 **response_kw)
668
/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
666 release_conn=release_conn, body_pos=body_pos,
--> 667 **response_kw)
668
/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
666 release_conn=release_conn, body_pos=body_pos,
--> 667 **response_kw)
668
/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
637 retries = retries.increment(method, url, error=e, _pool=self,
--> 638 _stacktrace=sys.exc_info()[2])
639 retries.sleep()
/usr/local/lib/python3.6/dist-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
398 if new_retry.is_exhausted():
--> 399 raise MaxRetryError(_pool, url, error or ResponseError(cause))
400
MaxRetryError: HTTPConnectionPool(host='ml-pipeline.kubeflow.svc.cluster.local', port=8888): Max retries exceeded with url: /apis/v1beta1/experiments?page_token=&page_size=100&sort_by= (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1cc8b3e860>: Failed to establish a new connection: [Errno -2] Name or service not known',))
During handling of the above exception, another exception occurred:
gaierror Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/urllib3/connection.py in _new_conn(self)
158 conn = connection.create_connection(
--> 159 (self._dns_host, self.port), self.timeout, **extra_kw)
160
/usr/local/lib/python3.6/dist-packages/urllib3/util/connection.py in create_connection(address, timeout, source_address, socket_options)
56
---> 57 for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
58 af, socktype, proto, canonname, sa = res
/usr/lib/python3.6/socket.py in getaddrinfo(host, port, family, type, proto, flags)
744 addrlist = []
--> 745 for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
746 af, socktype, proto, canonname, sa = res
gaierror: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
NewConnectionError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
599 body=body, headers=headers,
--> 600 chunked=chunked)
601
/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
353 else:
--> 354 conn.request(method, url, **httplib_request_kw)
355
/usr/lib/python3.6/http/client.py in request(self, method, url, body, headers, encode_chunked)
1238 """Send a complete request to the server."""
-> 1239 self._send_request(method, url, body, headers, encode_chunked)
1240
/usr/lib/python3.6/http/client.py in _send_request(self, method, url, body, headers, encode_chunked)
1284 body = _encode(body, 'body')
-> 1285 self.endheaders(body, encode_chunked=encode_chunked)
1286
/usr/lib/python3.6/http/client.py in endheaders(self, message_body, encode_chunked)
1233 raise CannotSendHeader()
-> 1234 self._send_output(message_body, encode_chunked=encode_chunked)
1235
/usr/lib/python3.6/http/client.py in _send_output(self, message_body, encode_chunked)
1025 del self._buffer[:]
-> 1026 self.send(msg)
1027
/usr/lib/python3.6/http/client.py in send(self, data)
963 if self.auto_open:
--> 964 self.connect()
965 else:
/usr/local/lib/python3.6/dist-packages/urllib3/connection.py in connect(self)
180 def connect(self):
--> 181 conn = self._new_conn()
182 self._prepare_conn(conn)
/usr/local/lib/python3.6/dist-packages/urllib3/connection.py in _new_conn(self)
167 raise NewConnectionError(
--> 168 self, "Failed to establish a new connection: %s" % e)
169
NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f1cc8a4c5f8>: Failed to establish a new connection: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
MaxRetryError Traceback (most recent call last)
<ipython-input-325-c8d6a70afd2d> in <module>
10 experiment = client.get_experiment(experiment_name=experiment_name)
11 except:
---> 12 experiment = client.create_experiment(experiment_name)
13
14 print(experiment)
/usr/local/lib/python3.6/dist-packages/kfp/_client.py in create_experiment(self, name)
172 logging.info('Creating experiment {}.'.format(name))
173 experiment = kfp_server_api.models.ApiExperiment(name=name)
--> 174 experiment = self._experiment_api.create_experiment(body=experiment)
175
176 if self._is_ipython():
/usr/local/lib/python3.6/dist-packages/kfp_server_api/api/experiment_service_api.py in create_experiment(self, body, **kwargs)
52 return self.create_experiment_with_http_info(body, **kwargs) # noqa: E501
53 else:
---> 54 (data) = self.create_experiment_with_http_info(body, **kwargs) # noqa: E501
55 return data
56
/usr/local/lib/python3.6/dist-packages/kfp_server_api/api/experiment_service_api.py in create_experiment_with_http_info(self, body, **kwargs)
129 _preload_content=params.get('_preload_content', True),
130 _request_timeout=params.get('_request_timeout'),
--> 131 collection_formats=collection_formats)
132
133 def delete_experiment(self, id, **kwargs): # noqa: E501
/usr/local/lib/python3.6/dist-packages/kfp_server_api/api_client.py in call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, async_req, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
328 response_type, auth_settings,
329 _return_http_data_only, collection_formats,
--> 330 _preload_content, _request_timeout)
331 else:
332 thread = self.pool.apply_async(self.__call_api, (resource_path,
/usr/local/lib/python3.6/dist-packages/kfp_server_api/api_client.py in __call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
159 post_params=post_params, body=body,
160 _preload_content=_preload_content,
--> 161 _request_timeout=_request_timeout)
162
163 self.last_response = response_data
/usr/local/lib/python3.6/dist-packages/kfp_server_api/api_client.py in request(self, method, url, query_params, headers, post_params, body, _preload_content, _request_timeout)
371 _preload_content=_preload_content,
372 _request_timeout=_request_timeout,
--> 373 body=body)
374 elif method == "PUT":
375 return self.rest_client.PUT(url,
/usr/local/lib/python3.6/dist-packages/kfp_server_api/rest.py in POST(self, url, headers, query_params, post_params, body, _preload_content, _request_timeout)
273 _preload_content=_preload_content,
274 _request_timeout=_request_timeout,
--> 275 body=body)
276
277 def PUT(self, url, headers=None, query_params=None, post_params=None,
/usr/local/lib/python3.6/dist-packages/kfp_server_api/rest.py in request(self, method, url, query_params, headers, body, post_params, _preload_content, _request_timeout)
165 preload_content=_preload_content,
166 timeout=timeout,
--> 167 headers=headers)
168 elif headers['Content-Type'] == 'application/x-www-form-urlencoded': # noqa: E501
169 r = self.pool_manager.request(
/usr/local/lib/python3.6/dist-packages/urllib3/request.py in request(self, method, url, fields, headers, **urlopen_kw)
70 return self.request_encode_body(method, url, fields=fields,
71 headers=headers,
---> 72 **urlopen_kw)
73
74 def request_encode_url(self, method, url, fields=None, headers=None,
/usr/local/lib/python3.6/dist-packages/urllib3/request.py in request_encode_body(self, method, url, fields, headers, encode_multipart, multipart_boundary, **urlopen_kw)
148 extra_kw.update(urlopen_kw)
149
--> 150 return self.urlopen(method, url, **extra_kw)
/usr/local/lib/python3.6/dist-packages/urllib3/poolmanager.py in urlopen(self, method, url, redirect, **kw)
322 response = conn.urlopen(method, url, **kw)
323 else:
--> 324 response = conn.urlopen(method, u.request_uri, **kw)
325
326 redirect_location = redirect and response.get_redirect_location()
/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
665 timeout=timeout, pool_timeout=pool_timeout,
666 release_conn=release_conn, body_pos=body_pos,
--> 667 **response_kw)
668
669 def drain_and_release_conn(response):
/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
665 timeout=timeout, pool_timeout=pool_timeout,
666 release_conn=release_conn, body_pos=body_pos,
--> 667 **response_kw)
668
669 def drain_and_release_conn(response):
/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
665 timeout=timeout, pool_timeout=pool_timeout,
666 release_conn=release_conn, body_pos=body_pos,
--> 667 **response_kw)
668
669 def drain_and_release_conn(response):
/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
636
637 retries = retries.increment(method, url, error=e, _pool=self,
--> 638 _stacktrace=sys.exc_info()[2])
639 retries.sleep()
640
/usr/local/lib/python3.6/dist-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
397
398 if new_retry.is_exhausted():
--> 399 raise MaxRetryError(_pool, url, error or ResponseError(cause))
400
401 log.debug("Incremented Retry for (url='%s'): %r", url, new_retry)
MaxRetryError: HTTPConnectionPool(host='ml-pipeline.kubeflow.svc.cluster.local', port=8888): Max retries exceeded with url: /apis/v1beta1/experiments (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1cc8a4c5f8>: Failed to establish a new connection: [Errno -2] Name or service not known',))
请注意,我没有包括所有重试,但我认为您明白了。我尝试使用由提供的 IP microk8s.enable
,它给了我一种成功的输出,但所有值None
仍然不是我想要的。
client = kfp.Client(host='http://xx.xx.xx.xx.xip.io')
experiment = client.create_experiment('test')
Experiment link here
{'created_at': None, 'description': None, 'id': None, 'name': None}
任何帮助将非常感激。让我知道您需要正确评估的任何其他输出。仍在学习 Kubeflow,因此不确定如何调试,并且在 Kubeflow 文档、microk8s 文档或其他线程中找不到太多关于它的信息。目前正在处理这两个示例。
https://github.com/kubeflow/examples/blob/master/named_entity_recognition/notebooks/Pipeline.ipynb