我正在关注有关为 Tensorflow 模型训练设置分布式 GPU 集群的官方 AWS EKS教程,但遇到了一些障碍。
使用创建新集群eksctl
并验证~/.kube/config
网关节点上是否存在相应文件后,教程指示我在网关节点上下载ksonnet
并使用它来初始化新应用程序:
$ ks init <app-name>
但是,当我尝试运行它时,我收到以下错误:
INFO Using context "arn:aws:eks:us-west-2:131397771409:cluster/<cluster name>" from kubeconfig file "/home/ubuntu/.kube/config"
INFO Creating environment "default" with namespace "default", pointing to "version:v1.18.9" cluster at address <cluster address>
ERROR No Major.Minor.Patch elements found
我在 Github/SO 上进行了一些搜索,但未能找到解决此问题的方法。我怀疑真正的答案是远离 using ksonnet
,因为它不再被维护(并且在过去 2 年中似乎没有),但目前我只想能够完成教程 :)
任何见解表示赞赏!
我的内容~/.kube/config
:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: <certificate>
server: <server>
name: arn:aws:eks:us-west-2:131397771409:cluster/<name>
contexts:
- context:
cluster: arn:aws:eks:us-west-2:131397771409:cluster/<name>
user: arn:aws:eks:us-west-2:131397771409:cluster/<name>
name: arn:aws:eks:us-west-2:131397771409:cluster/<name>
current-context: arn:aws:eks:us-west-2:131397771409:cluster/<name>
kind: Config
preferences: {}
users:
- name: arn:aws:eks:us-west-2:131397771409:cluster/<name>
user:
exec:
apiVersion: client.authentication.k8s.io/v1alpha1
args:
- --region
- us-west-2
- eks
- get-token
- --cluster-name
- <name>
command: aws