1

我已经在本地机器上使用 microk8s 和 Kubeflow 建立了一个本地集群。我按照这些安装说明来启动并运行我的集群。我已经启动了一个 Jupyter 服务器并编写了一个 Kubeflow 管道。

我用来定义组件的 YAML 文件如下所示:

name: beat_the_market - Preprocess
description:  Preprocesses market data and loads into GCS bucket.
inputs:
- {name: project, type: String, description: GCP Project ID}
- {name: bucket, type: GCSPath, description: GCS bucket path}
- {name: ticker, type: String, description: Ticker symbol for selected stock}

outputs:
- {name: Trained model, type: Tensorflow model}

implementation:
    container:
        image: us.gcr.io/manceps-labs/beat_the_market:latest
        command: [python3, /opt/preprocess.py,
        --project, {inputValue: project},
        --bucket, {inputValue: bucket},
        --ticker, {inputValue: ticker}
        ] 

不幸的是,当我尝试使用 Kubeflow Pipelines SDK 创建实验时,出现以下错误:

2020-04-15 23:03:25,135 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1cc8a4c358>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /apis/v1beta1/experiments
2020-04-15 23:03:25,135 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1cc8a4c358>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /apis/v1beta1/experiments
WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1cc8a4c358>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /apis/v1beta1/experiments

---------------------------------------------------------------------------
gaierror                                  Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/urllib3/connection.py in _new_conn(self)
    158             conn = connection.create_connection(
--> 159                 (self._dns_host, self.port), self.timeout, **extra_kw)
    160 

/usr/local/lib/python3.6/dist-packages/urllib3/util/connection.py in create_connection(address, timeout, source_address, socket_options)
     56 
---> 57     for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
     58         af, socktype, proto, canonname, sa = res

/usr/lib/python3.6/socket.py in getaddrinfo(host, port, family, type, proto, flags)
    744     addrlist = []
--> 745     for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    746         af, socktype, proto, canonname, sa = res

gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

NewConnectionError                        Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    599                                                   body=body, headers=headers,
--> 600                                                   chunked=chunked)
    601 

/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    353         else:
--> 354             conn.request(method, url, **httplib_request_kw)
    355 

/usr/lib/python3.6/http/client.py in request(self, method, url, body, headers, encode_chunked)
   1238         """Send a complete request to the server."""
-> 1239         self._send_request(method, url, body, headers, encode_chunked)
   1240 

/usr/lib/python3.6/http/client.py in _send_request(self, method, url, body, headers, encode_chunked)
   1284             body = _encode(body, 'body')
-> 1285         self.endheaders(body, encode_chunked=encode_chunked)
   1286 

/usr/lib/python3.6/http/client.py in endheaders(self, message_body, encode_chunked)
   1233             raise CannotSendHeader()
-> 1234         self._send_output(message_body, encode_chunked=encode_chunked)
   1235 

/usr/lib/python3.6/http/client.py in _send_output(self, message_body, encode_chunked)
   1025         del self._buffer[:]
-> 1026         self.send(msg)
   1027 

/usr/lib/python3.6/http/client.py in send(self, data)
    963             if self.auto_open:
--> 964                 self.connect()
    965             else:

/usr/local/lib/python3.6/dist-packages/urllib3/connection.py in connect(self)
    180     def connect(self):
--> 181         conn = self._new_conn()
    182         self._prepare_conn(conn)

/usr/local/lib/python3.6/dist-packages/urllib3/connection.py in _new_conn(self)
    167             raise NewConnectionError(
--> 168                 self, "Failed to establish a new connection: %s" % e)
    169 

NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f1cc8b3e860>: Failed to establish a new connection: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

MaxRetryError                             Traceback (most recent call last)
<ipython-input-325-c8d6a70afd2d> in <module>
      9 try:
---> 10     experiment = client.get_experiment(experiment_name=experiment_name)
     11 except:

/usr/local/lib/python3.6/dist-packages/kfp/_client.py in get_experiment(self, experiment_id, experiment_name)
    213     while next_page_token is not None:
--> 214       list_experiments_response = self.list_experiments(page_size=100, page_token=next_page_token)
    215       next_page_token = list_experiments_response.next_page_token

/usr/local/lib/python3.6/dist-packages/kfp/_client.py in list_experiments(self, page_token, page_size, sort_by)
    193     response = self._experiment_api.list_experiment(
--> 194         page_token=page_token, page_size=page_size, sort_by=sort_by)
    195     return response

/usr/local/lib/python3.6/dist-packages/kfp_server_api/api/experiment_service_api.py in list_experiment(self, **kwargs)
    347         else:
--> 348             (data) = self.list_experiment_with_http_info(**kwargs)  # noqa: E501
    349             return data

/usr/local/lib/python3.6/dist-packages/kfp_server_api/api/experiment_service_api.py in list_experiment_with_http_info(self, **kwargs)
    429             _request_timeout=params.get('_request_timeout'),
--> 430             collection_formats=collection_formats)

/usr/local/lib/python3.6/dist-packages/kfp_server_api/api_client.py in call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, async_req, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
    329                                    _return_http_data_only, collection_formats,
--> 330                                    _preload_content, _request_timeout)
    331         else:

/usr/local/lib/python3.6/dist-packages/kfp_server_api/api_client.py in __call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
    160             _preload_content=_preload_content,
--> 161             _request_timeout=_request_timeout)
    162 

/usr/local/lib/python3.6/dist-packages/kfp_server_api/api_client.py in request(self, method, url, query_params, headers, post_params, body, _preload_content, _request_timeout)
    350                                         _request_timeout=_request_timeout,
--> 351                                         headers=headers)
    352         elif method == "HEAD":

/usr/local/lib/python3.6/dist-packages/kfp_server_api/rest.py in GET(self, url, headers, query_params, _preload_content, _request_timeout)
    237                             _request_timeout=_request_timeout,
--> 238                             query_params=query_params)
    239 

/usr/local/lib/python3.6/dist-packages/kfp_server_api/rest.py in request(self, method, url, query_params, headers, body, post_params, _preload_content, _request_timeout)
    210                                               timeout=timeout,
--> 211                                               headers=headers)
    212         except urllib3.exceptions.SSLError as e:

/usr/local/lib/python3.6/dist-packages/urllib3/request.py in request(self, method, url, fields, headers, **urlopen_kw)
     67                                            headers=headers,
---> 68                                            **urlopen_kw)
     69         else:

/usr/local/lib/python3.6/dist-packages/urllib3/request.py in request_encode_url(self, method, url, fields, headers, **urlopen_kw)
     88 
---> 89         return self.urlopen(method, url, **extra_kw)
     90 

/usr/local/lib/python3.6/dist-packages/urllib3/poolmanager.py in urlopen(self, method, url, redirect, **kw)
    323         else:
--> 324             response = conn.urlopen(method, u.request_uri, **kw)
    325 

/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    666                                 release_conn=release_conn, body_pos=body_pos,
--> 667                                 **response_kw)
    668 

/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    666                                 release_conn=release_conn, body_pos=body_pos,
--> 667                                 **response_kw)
    668 

/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    666                                 release_conn=release_conn, body_pos=body_pos,
--> 667                                 **response_kw)
    668 

/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    637             retries = retries.increment(method, url, error=e, _pool=self,
--> 638                                         _stacktrace=sys.exc_info()[2])
    639             retries.sleep()

/usr/local/lib/python3.6/dist-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
    398         if new_retry.is_exhausted():
--> 399             raise MaxRetryError(_pool, url, error or ResponseError(cause))
    400 

MaxRetryError: HTTPConnectionPool(host='ml-pipeline.kubeflow.svc.cluster.local', port=8888): Max retries exceeded with url: /apis/v1beta1/experiments?page_token=&page_size=100&sort_by= (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1cc8b3e860>: Failed to establish a new connection: [Errno -2] Name or service not known',))

During handling of the above exception, another exception occurred:

gaierror                                  Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/urllib3/connection.py in _new_conn(self)
    158             conn = connection.create_connection(
--> 159                 (self._dns_host, self.port), self.timeout, **extra_kw)
    160 

/usr/local/lib/python3.6/dist-packages/urllib3/util/connection.py in create_connection(address, timeout, source_address, socket_options)
     56 
---> 57     for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
     58         af, socktype, proto, canonname, sa = res

/usr/lib/python3.6/socket.py in getaddrinfo(host, port, family, type, proto, flags)
    744     addrlist = []
--> 745     for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    746         af, socktype, proto, canonname, sa = res

gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

NewConnectionError                        Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    599                                                   body=body, headers=headers,
--> 600                                                   chunked=chunked)
    601 

/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    353         else:
--> 354             conn.request(method, url, **httplib_request_kw)
    355 

/usr/lib/python3.6/http/client.py in request(self, method, url, body, headers, encode_chunked)
   1238         """Send a complete request to the server."""
-> 1239         self._send_request(method, url, body, headers, encode_chunked)
   1240 

/usr/lib/python3.6/http/client.py in _send_request(self, method, url, body, headers, encode_chunked)
   1284             body = _encode(body, 'body')
-> 1285         self.endheaders(body, encode_chunked=encode_chunked)
   1286 

/usr/lib/python3.6/http/client.py in endheaders(self, message_body, encode_chunked)
   1233             raise CannotSendHeader()
-> 1234         self._send_output(message_body, encode_chunked=encode_chunked)
   1235 

/usr/lib/python3.6/http/client.py in _send_output(self, message_body, encode_chunked)
   1025         del self._buffer[:]
-> 1026         self.send(msg)
   1027 

/usr/lib/python3.6/http/client.py in send(self, data)
    963             if self.auto_open:
--> 964                 self.connect()
    965             else:

/usr/local/lib/python3.6/dist-packages/urllib3/connection.py in connect(self)
    180     def connect(self):
--> 181         conn = self._new_conn()
    182         self._prepare_conn(conn)

/usr/local/lib/python3.6/dist-packages/urllib3/connection.py in _new_conn(self)
    167             raise NewConnectionError(
--> 168                 self, "Failed to establish a new connection: %s" % e)
    169 

NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f1cc8a4c5f8>: Failed to establish a new connection: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

MaxRetryError                             Traceback (most recent call last)
<ipython-input-325-c8d6a70afd2d> in <module>
     10     experiment = client.get_experiment(experiment_name=experiment_name)
     11 except:
---> 12     experiment = client.create_experiment(experiment_name)
     13 
     14 print(experiment)

/usr/local/lib/python3.6/dist-packages/kfp/_client.py in create_experiment(self, name)
    172       logging.info('Creating experiment {}.'.format(name))
    173       experiment = kfp_server_api.models.ApiExperiment(name=name)
--> 174       experiment = self._experiment_api.create_experiment(body=experiment)
    175 
    176     if self._is_ipython():

/usr/local/lib/python3.6/dist-packages/kfp_server_api/api/experiment_service_api.py in create_experiment(self, body, **kwargs)
     52             return self.create_experiment_with_http_info(body, **kwargs)  # noqa: E501
     53         else:
---> 54             (data) = self.create_experiment_with_http_info(body, **kwargs)  # noqa: E501
     55             return data
     56 

/usr/local/lib/python3.6/dist-packages/kfp_server_api/api/experiment_service_api.py in create_experiment_with_http_info(self, body, **kwargs)
    129             _preload_content=params.get('_preload_content', True),
    130             _request_timeout=params.get('_request_timeout'),
--> 131             collection_formats=collection_formats)
    132 
    133     def delete_experiment(self, id, **kwargs):  # noqa: E501

/usr/local/lib/python3.6/dist-packages/kfp_server_api/api_client.py in call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, async_req, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
    328                                    response_type, auth_settings,
    329                                    _return_http_data_only, collection_formats,
--> 330                                    _preload_content, _request_timeout)
    331         else:
    332             thread = self.pool.apply_async(self.__call_api, (resource_path,

/usr/local/lib/python3.6/dist-packages/kfp_server_api/api_client.py in __call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
    159             post_params=post_params, body=body,
    160             _preload_content=_preload_content,
--> 161             _request_timeout=_request_timeout)
    162 
    163         self.last_response = response_data

/usr/local/lib/python3.6/dist-packages/kfp_server_api/api_client.py in request(self, method, url, query_params, headers, post_params, body, _preload_content, _request_timeout)
    371                                          _preload_content=_preload_content,
    372                                          _request_timeout=_request_timeout,
--> 373                                          body=body)
    374         elif method == "PUT":
    375             return self.rest_client.PUT(url,

/usr/local/lib/python3.6/dist-packages/kfp_server_api/rest.py in POST(self, url, headers, query_params, post_params, body, _preload_content, _request_timeout)
    273                             _preload_content=_preload_content,
    274                             _request_timeout=_request_timeout,
--> 275                             body=body)
    276 
    277     def PUT(self, url, headers=None, query_params=None, post_params=None,

/usr/local/lib/python3.6/dist-packages/kfp_server_api/rest.py in request(self, method, url, query_params, headers, body, post_params, _preload_content, _request_timeout)
    165                         preload_content=_preload_content,
    166                         timeout=timeout,
--> 167                         headers=headers)
    168                 elif headers['Content-Type'] == 'application/x-www-form-urlencoded':  # noqa: E501
    169                     r = self.pool_manager.request(

/usr/local/lib/python3.6/dist-packages/urllib3/request.py in request(self, method, url, fields, headers, **urlopen_kw)
     70             return self.request_encode_body(method, url, fields=fields,
     71                                             headers=headers,
---> 72                                             **urlopen_kw)
     73 
     74     def request_encode_url(self, method, url, fields=None, headers=None,

/usr/local/lib/python3.6/dist-packages/urllib3/request.py in request_encode_body(self, method, url, fields, headers, encode_multipart, multipart_boundary, **urlopen_kw)
    148         extra_kw.update(urlopen_kw)
    149 
--> 150         return self.urlopen(method, url, **extra_kw)

/usr/local/lib/python3.6/dist-packages/urllib3/poolmanager.py in urlopen(self, method, url, redirect, **kw)
    322             response = conn.urlopen(method, url, **kw)
    323         else:
--> 324             response = conn.urlopen(method, u.request_uri, **kw)
    325 
    326         redirect_location = redirect and response.get_redirect_location()

/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    665                                 timeout=timeout, pool_timeout=pool_timeout,
    666                                 release_conn=release_conn, body_pos=body_pos,
--> 667                                 **response_kw)
    668 
    669         def drain_and_release_conn(response):

/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    665                                 timeout=timeout, pool_timeout=pool_timeout,
    666                                 release_conn=release_conn, body_pos=body_pos,
--> 667                                 **response_kw)
    668 
    669         def drain_and_release_conn(response):

/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    665                                 timeout=timeout, pool_timeout=pool_timeout,
    666                                 release_conn=release_conn, body_pos=body_pos,
--> 667                                 **response_kw)
    668 
    669         def drain_and_release_conn(response):

/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    636 
    637             retries = retries.increment(method, url, error=e, _pool=self,
--> 638                                         _stacktrace=sys.exc_info()[2])
    639             retries.sleep()
    640 

/usr/local/lib/python3.6/dist-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
    397 
    398         if new_retry.is_exhausted():
--> 399             raise MaxRetryError(_pool, url, error or ResponseError(cause))
    400 
    401         log.debug("Incremented Retry for (url='%s'): %r", url, new_retry)

MaxRetryError: HTTPConnectionPool(host='ml-pipeline.kubeflow.svc.cluster.local', port=8888): Max retries exceeded with url: /apis/v1beta1/experiments (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1cc8a4c5f8>: Failed to establish a new connection: [Errno -2] Name or service not known',))

请注意,我没有包括所有重试,但我认为您明白了。我尝试使用由提供的 IP microk8s.enable,它给了我一种成功的输出,但所有值None仍然不是我想要的。

client = kfp.Client(host='http://xx.xx.xx.xx.xip.io')
experiment = client.create_experiment('test')
Experiment link here

{'created_at': None, 'description': None, 'id': None, 'name': None}

任何帮助将非常感激。让我知道您需要正确评估的任何其他输出。仍在学习 Kubeflow,因此不确定如何调试,并且在 Kubeflow 文档、microk8s 文档或其他线程中找不到太多关于它的信息。目前正在处理这两个示例。

https://github.com/kubeflow/examples/blob/master/named_entity_recognition/notebooks/Pipeline.ipynb

https://github.com/kubeflow/pipelines/blob/master/samples/tutorials/mnist/03_Reusable_Components.ipynb

4

1 回答 1

1

尝试这个:

client = kfp.Client(host='pipelines-api.kubeflow.svc.cluster.local:8888').

这帮助我解决了 HTTPConnection 错误

于 2020-09-11T06:50:03.560 回答