0

I'm tasked with defining AWS tools for ML development at a medium-sized company. Assume about a dozen ML engineers plus other DevOps staff familiar with serverless ( lambdas and the framework ). The main questions are: a) what is an architecture that allows for the main tasks related to ML development (creating, training, fitting models, data pre-processing, hyper parameter optimization, job management, wrapping serverless services, gathering model metrics, etc ), b) what are the main tools that can be used for packaging and deploying things and c) what are the development tools (IDEs, SDKs, 'frameworks' ) used for it? I just want to set Jupyter notebooks aside for a second. Jupyter notebooks are great for proof-of-concepts and the closest thing to PowerPoint for management... But I have a problem with notebooks when thinking about deployable units of code.
My intuition points to a preliminary target architecture with 5 parts:

1 - A 'core' with ML models supporting basic model operations (create blank, create pre-trained, train, test/fit, etc). I foresee core Python scripts here - no problem.

2- (optional) A 'containerized-set-of-things' that performs hyper parameter optimization and/or model versioning

3- A 'contained-unit-of-Python-scripts-around-models' that exposes an API and that does job management and incorporates data pre-processing. This also reads and writes to S3 buckets.

4- A 'serverless layer' with high level API ( in Python ). It talks to #3 and/or #1 above.

5- Some container or bundling thing that will unpack files from Git and deploy them onto various AWS services creating things from the previous 3 points.

As you can see, my terms are rather fuzzy:) If someone can be specific with terms that will be helpful. My intuition and my preliminary readings say that the answer will likely include a local IDE like PyCharm or Anaconda or a cloud-based IDE (what can these be? - don't mention notebooks please). The point that I'm not really clear about is #5. Candidates include Amazon SageMaker Components for Kubeflow Pipelines and/or Amazon SageMaker Components for Kubeflow Pipelines and/or AWS Step Functions DS SDK For SageMaker. It's unclear to me how they can perform #5, however. Kubeflow looks very interesting but does it have enough adoption or will it die in 2 years? Are Amazon SageMaker Components for Kubeflow Pipelines, Amazon SageMaker Components for Kubeflow Pipelines and AWS Step Functions DS SDK For SageMaker mutually exclusive? How can each of them help with 'containerizing things' and with basic provisioning and deployment tasks?

4

1 回答 1

1

这是一个很长的问题,当您考虑为生产设计 ML 基础架构时,这些事情完全有道理。因此,有三个级别定义了机器学习过程的成熟度。

1- CI/CDeployment:在这个 docker 镜像中,将经历构建、测试和推送版本化训练镜像到注册表等阶段。您还可以在这些中执行训练,并可以使用 git 引用存储版本化模型。 在此处输入图像描述

2- 持续训练:这里我们处理 ML Pipeline。使用新数据重新训练模型的过程自动化。当您必须使用新数据或新实现运行整个 ML 管道时,它变得非常有用。

实施工具:Kubeflow 管道、Sagemaker、Nuclio 在此处输入图像描述

3-持续交付:在哪里??在云端还是在边缘?在云上,您可以使用 KF 服务或将 sage maker 与 kubeflow 管道一起使用,并通过 Kubeflow 使用 sagemaker 部署模型。

Sagemaker 和 Kubeflow 以某种方式提供了相同的功能,但它们每个都有其独特的功能。Kubeflow 具有 kubernetes、管道、可移植性、缓存和工件的能力,而 Sagemaker 具有 Manged 基础设施和从 0 能力和 AWS ML 服务(如 Athena 或 Groundtruth)扩展的能力。

解决方案:

Kubeflow 管道独立 + AWS Sagemaker(培训+服务模型)+ Lambda 从 S3 或 Kinesis 触发管道。

需要基础设施。

-Kubernettess cluster (Atleast 1 m5)
-MinIo or S3 
-Container registry
-Sagemaker credentials
-MySQL or RDS  
-Loadbalancer
-Ingress for using kubeflow SDK

你又一次用一个问题问了我一年的旅程。如果你有兴趣让我们联系:)

权限:

Kube --> registry (Read)
Kube --> S3 (Read, Write)
Kube --> RDS (Read, Write)
Lambda --> S3 (Read)
Lambda --> Kube (API Access) 
Sagemaker --> S3, Registery

一个很好的入门指南

https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/env/aws

https://aws.amazon.com/blogs/machine-learning/introducing-amazon-sagemaker-components-for-kubeflow-pipelines/

https://github.com/shashankprasanna/kubeflow-pipelines-sagemaker-examples/blob/master/kfp-sagemaker-custom-container.ipynb

于 2020-12-08T19:39:18.330 回答