几个月来我一直在尝试设置 Kubernetes 集群,但到目前为止我还没有运气。
我正在尝试将其设置在 4台运行coreOS的裸机PC 上。我刚刚重新安装了所有东西,但我遇到了和以前一样的问题。我正在关注本教程。我想我已经正确配置了一切,但不是 100% 确定。当我重新启动任何机器时,kubelet 和 flanneld 服务正在运行,但是在检查服务状态时我看到以下错误:systemctl status
kubelet 错误: Process: 1246 ExecStartPre=/usr/bin/rkt rm --uuid-file=/var/run/kubelet-pod.uuid (code=exited, status=254)
法兰错误:Process: 1057 ExecStartPre=/usr/bin/rkt rm --uuid-file=/var/lib/coreos/flannel-wrapper.uuid (code=exited, status=254)
如果我重新启动这两个服务,它们就可以工作,或者至少看起来它们可以工作 - 我没有收到任何错误。
其他一切似乎都运行良好,所以剩下的唯一问题(我认为)是所有节点上的 kube-proxy 服务。
如果我运行,kubectl get pods
我会看到所有 pod 都在运行:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
kube-apiserver-kubernetes-4 1/1 Running 4 6m
kube-controller-manager-kubernetes-4 1/1 Running 6 6m
kube-proxy-kubernetes-1 1/1 Running 4 18h
kube-proxy-kubernetes-2 1/1 Running 5 26m
kube-proxy-kubernetes-3 1/1 Running 4 19m
kube-proxy-kubernetes-4 1/1 Running 4 18h
kube-scheduler-kubernetes-4 1/1 Running 6 18h
这个问题的答案建议检查是否kubectl get node
返回在 kubelet 上注册的相同名称。据我检查日志,节点注册正确,这是输出kubectl get node
$ kubectl get node
NAME STATUS AGE VERSION
kubernetes-1 Ready 18h v1.6.1+coreos.0
kubernetes-2 Ready 36m v1.6.1+coreos.0
kubernetes-3 Ready 29m v1.6.1+coreos.0
kubernetes-4 Ready,SchedulingDisabled 18h v1.6.1+coreos.0
我使用的教程(上面链接)建议我使用--hostname-override
,但如果我试图在本地卷曲它,我无法获取主节点(kubernetes-4)上的节点信息。所以我删除了它,我现在可以正常获取节点信息。
有人建议这可能是法兰绒问题,我应该检查法兰绒端口。使用netstat -lntu
我得到以下输出:
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 127.0.0.1:10248 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:10249 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:2379 0.0.0.0:* LISTEN
tcp 0 0 MASTER_IP:2379 0.0.0.0:* LISTEN
tcp 0 0 MASTER_IP:2380 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:8080 0.0.0.0:* LISTEN
tcp6 0 0 :::4194 :::* LISTEN
tcp6 0 0 :::10250 :::* LISTEN
tcp6 0 0 :::10251 :::* LISTEN
tcp6 0 0 :::10252 :::* LISTEN
tcp6 0 0 :::10255 :::* LISTEN
tcp6 0 0 :::22 :::* LISTEN
tcp6 0 0 :::443 :::* LISTEN
udp 0 0 0.0.0.0:8472 0.0.0.0:*
所以我认为端口很好?
etcd2 也可以工作,etcdctlcluster-health
显示所有节点都是健康的
这是 cloud-config 的一部分,它在重启时启动 etcd2,此外我只在其中存储 ssh 密钥和节点用户名/密码/组:
#cloud-config
coreos:
etcd2:
name: "kubernetes-4"
initial-advertise-peer-urls: "http://NODE_IP:2380"
listen-peer-urls: "http://NODE_IP:2380"
listen-client-urls: "http://NODE_IP,http://127.0.0.1:2379"
advertise-client-urls: "http://NODE_IP:2379"
initial-cluster-token: "etcd-cluster-1"
initial-cluster: "kubernetes-4=http://MASTER_IP:2380,kubernetes-1=http://WORKER_1_IP:2380,kubernetes-2=http://WORKER_2_IP:2380,kubernetes-3=http://WORKER_3_IP:2380"
initial-cluster-state: "new"
units:
- name: etcd2.service
command: start
这是/etc/flannel/options.env
文件的内容:
FLANNELD_IFACE=NODE_IP
FLANNELD_ETCD_ENDPOINTS=http://MASTER_IP:2379,http://WORKER_1_IP:2379,http://WORKER_2_IP:2379,http://WORKER_3_IP:2379
相同的端点--etcd-servers
在kube-apiserver.yaml
文件中
任何想法/建议可能是什么问题?另外,如果缺少一些细节,请告诉我,我会将它们添加到帖子中。
编辑:我忘了包括 kube-proxy 日志。
主节点 kube-proxy 日志:
$ kubectl logs kube-proxy-kubernetes-4
I0615 07:47:45.250631 1 server.go:225] Using iptables Proxier.
W0615 07:47:45.286923 1 server.go:469] Failed to retrieve node info: Get http://127.0.0.1:8080/api/v1/nodes/kubernetes-4: dial tcp 127.0.0.1:8080: getsockopt: connection refused
W0615 07:47:45.303576 1 proxier.go:304] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
W0615 07:47:45.303593 1 proxier.go:309] clusterCIDR not specified, unable to distinguish between internal and external traffic
I0615 07:47:45.303646 1 server.go:249] Tearing down userspace rules.
E0615 07:47:45.357276 1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:49: Failed to list *api.Endpoints: Get http://127.0.0.1:8080/api/v1/endpoints?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E0615 07:47:45.357278 1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:46: Failed to list *api.Service: Get http://127.0.0.1:8080/api/v1/services?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
工作节点 kube-proxy 日志:
$ kubectl logs kube-proxy-kubernetes-1
I0615 07:47:33.667025 1 server.go:225] Using iptables Proxier.
W0615 07:47:33.697387 1 server.go:469] Failed to retrieve node info: Get https://MASTER_IP/api/v1/nodes/kubernetes-1: dial tcp MASTER_IP:443: getsockopt: connection refused
W0615 07:47:33.712718 1 proxier.go:304] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
W0615 07:47:33.712734 1 proxier.go:309] clusterCIDR not specified, unable to distinguish between internal and external traffic
I0615 07:47:33.712773 1 server.go:249] Tearing down userspace rules.
E0615 07:47:33.787122 1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:49: Failed to list *api.Endpoints: Get https://MASTER_IP/api/v1/endpoints?resourceVersion=0: dial tcp MASTER_IP:443: getsockopt: connection refused
E0615 07:47:33.787144 1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:46: Failed to list *api.Service: Get https://MASTER_IP/api/v1/services?resourceVersion=0: dial tcp MASTER_IP:443: getsockopt: connection refused