我正在尝试使用以下内容启动 Ubuntu Kubernetes v1.9 集群:
export KUBE_ROOT=~/kubernetes
export NUM_NODES=1
export NODE_SIZE=n1-standard-8
export NODE_DISK_SIZE=100GB
export KUBE_GCE_INSTANCE_PREFIX=kubernetes-test
export KUBE_OS_DISTRIBUTION=ubuntu
export KUBE_MASTER_OS_DISTRIBUTION=ubuntu
export KUBE_GCE_MASTER_PROJECT=ubuntu-os-cloud
export KUBE_GCE_MASTER_IMAGE=ubuntu-1604-xenial-v20161130
export KUBE_NODE_OS_DISTRIBUTION=ubuntu
export KUBE_GCE_NODE_PROJECT=ubuntu-os-cloud
export KUBE_GCE_NODE_IMAGE=ubuntu-1604-xenial-v20161130
~/kubernetes/cluster/kube-up.sh
照原样,这会导致初始化失败:
Waiting up to 300 seconds for cluster initialization.
This will continually check to see if the API for kubernetes is reachable.
This may time out if there was some uncaught error during start up.
........................................................................................
....................................................Cluster failed to initialize within 300 seconds.
在主节点上的日志 (/var/log/syslog) 中,我可以看到由于 (1) 缺少 python-yaml 导致的错误:
master configure.sh[2013]: Traceback (most recent call last):
master configure.sh[2013]: File "<string>", line 2, in <module>
master configure.sh[2013]: ImportError: No module named yaml
修复该问题(见下文)会导致有关 (2) Docker 映像加载失败的错误消息:
master configure.sh[1979]: Try to load docker image file /home/kubernetes/kube-docker-files/kube-apiserver.tar
master configure.sh[1979]: timeout: failed to run command 'docker': No such file or directory
master configure.sh[1979]: message repeated 4 times: [ timeout: failed to run command 'docker': No such file or directory]
master configure.sh[1979]: Fail to load docker image file /home/kubernetes/kube-docker-files/kube-apiserver.tar after 5 retries. Exit!!
master systemd[1]: kube-master-installation.service: Main process exited, code=exited, status=1/FAILURE
master systemd[1]: Failed to start Download and install k8s binaries and configurations.
我通过在 kubernetes/cluster/gce/gci/configure.sh 中包含以下内容来解决问题:
function special-ubuntu-setup {
# Special installation required for ubuntu 16.04?
apt-get install python-yaml
# Install docker
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
apt-get update
apt-cache policy docker-ce
apt-get install -y docker-ce
}
然后,在 configure.sh 底部的“主循环”中,在下载 kube-env 之前调用该函数:
special-ubuntu-setup
这使我能够成功设置集群。但是,此修复程序的形式似乎非常糟糕。我尝试将相同的内容添加到通过元数据传递给 GCE 实例的启动脚本中,但该脚本在 configure.sh 之后运行,因此无法修复错误。请注意,使用默认操作系统 (cos) 运行正常。
我在这里做错了什么?有没有更好的方法让 Ubuntu 集群运行?