docker - Ubuntu 集群设置后无法 Ping Pod

Question

我已按照最新的说明（2015 年 5 月 7 日更新）在 Ubuntu** 中使用 etcd 和 flanneld 设置集群。但是我的网络有问题……它似乎处于某种损坏状态。

**注意：我更新了配置脚本以安装 0.16.2。也没有kubectl get minions返回任何开始但在sudo service kube-controller-manager restart他们出现之后。

这是我的设置：

| ServerName | Public IP   | Private IP |  
------------------------------------------  
| KubeMaster | 107.x.x.32  | 10.x.x.54  |  
| KubeNode1  | 104.x.x.49  | 10.x.x.55  |  
| KubeNode2  | 198.x.x.39  | 10.x.x.241 |  
| KubeNode3  | 104.x.x.52  | 10.x.x.190 |  
| MongoDev1  | 162.x.x.132 | 10.x.x.59  |  
| MongoDev2  | 104.x.x.103 | 10.x.x.60  |

从任何机器我可以 ping 任何其他机器......当我创建 pod 和服务时，我开始遇到问题。

荚

POD                           IP                  CONTAINER(S)        IMAGE(S)                                HOST                            LABELS                                   STATUS              CREATED
auth-dev-ctl-6xah8            172.16.37.7         sis-auth            leportlabs/sisauth:latestdev            104.x.x.52/104.x.x.52   environment=dev,name=sis-auth            Running             3 hours

所以这个 pod 已经启动了KubeNode3......如果我尝试从 KubeNode3 以外的任何机器上 ping 它，我会收到Destination Net Unreachable错误消息。例如

# ping 172.16.37.7
PING 172.16.37.7 (172.16.37.7) 56(84) bytes of data.
From 129.250.204.117 icmp_seq=1 Destination Net Unreachable

我可以把etcdctl get /coreos.com/network/config四个人都叫回来{"Network":"172.16.0.0/16"}。

我不知道从那里看。有谁可以帮我离开这里吗？

支持信息

在主节点上：

# ps -ef | grep kube
root      4729     1  0 May07 ?        00:06:29 /opt/bin/kube-scheduler --logtostderr=true --master=127.0.0.1:8080
root      4730     1  1 May07 ?        00:21:24 /opt/bin/kube-apiserver --address=0.0.0.0 --port=8080 --etcd_servers=http://127.0.0.1:4001 --logtostderr=true --portal_net=192.168.3.0/24
root      5724     1  0 May07 ?        00:10:25 /opt/bin/kube-controller-manager --master=127.0.0.1:8080 --machines=104.x.x.49,198.x.x.39,104.x.x.52 --logtostderr=true
# ps -ef | grep etcd
root      4723     1  2 May07 ?        00:32:46 /opt/bin/etcd -name infra0 -initial-advertise-peer-urls http://107.x.x.32:2380 -listen-peer-urls http://107.x.x.32:2380 -initial-cluster-token etcd-cluster-1 -initial-cluster infra0=http://107.x.x.32:2380,infra1=http://104.x.x.49:2380,infra2=http://198.x.x.39:2380,infra3=http://104.x.x.52:2380 -initial-cluster-state new

在一个节点上：

# ps -ef | grep kube
root     10878     1  1 May07 ?        00:16:22 /opt/bin/kubelet --address=0.0.0.0 --port=10250 --hostname_override=104.x.x.49 --api_servers=http://107.x.x.32:8080 --logtostderr=true --cluster_dns=192.168.3.10 --cluster_domain=kubernetes.local
root     10882     1  0 May07 ?        00:05:23 /opt/bin/kube-proxy --master=http://107.x.x.32:8080 --logtostderr=true
# ps -ef | grep etcd
root     10873     1  1 May07 ?        00:14:09 /opt/bin/etcd -name infra1 -initial-advertise-peer-urls http://104.x.x.49:2380 -listen-peer-urls http://104.x.x.49:2380 -initial-cluster-token etcd-cluster-1 -initial-cluster infra0=http://107.x.x.32:2380,infra1=http://104.x.x.49:2380,infra2=http://198.x.x.39:2380,infra3=http://104.x.x.52:2380 -initial-cluster-state new
#ps -ef | grep flanneld
root     19560     1  0 May07 ?        00:00:01 /opt/bin/flanneld

score 0 · Accepted Answer

所以我注意到法兰绒配置（/run/flannel/subnet.env）与 docker 启动时的不同（不知道它们是如何不同步的）。

# ps -ef | grep docker
root     19663     1  0 May07 ?        00:09:20 /usr/bin/docker -d -H tcp://127.0.0.1:4243 -H unix:///var/run/docker.sock --bip=172.16.85.1/24 --mtu=1472

# cat /run/flannel/subnet.env
FLANNEL_SUBNET=172.16.60.1/24
FLANNEL_MTU=1472
FLANNEL_IPMASQ=false

请注意， docker--bip=172.16.85.1/24与 flannel subnet 不同FLANNEL_SUBNET=172.16.60.1/24。

所以很自然地我改变/etc/default/docker以反映新的价值。

DOCKER_OPTS="-H tcp://127.0.0.1:4243 -H unix:///var/run/docker.sock --bip=172.16.60.1/24 --mtu=1472"

但现在 asudo service docker restart没有出错......所以看着/var/log/upstart/docker.log我可以看到以下内容

FATA[0000] Shutting down daemon due to errors: Bridge ip (172.16.85.1) does not match existing bridge configuration 172.16.60.1

所以最后一块拼图是删除旧桥并重新启动码头......

# sudo brctl delbr docker0
# sudo service docker start

如果sudo brctl delbr docker0返回bridge docker0 is still up; can't delete it运行ifconfig docker0 down并重试。

score -1 · Accepted Answer

-1

请试试这个：

ip link del docker0
systemctl restart flanneld

于 2015-08-07T09:20:38.370 回答

docker - Ubuntu 集群设置后无法 Ping Pod

荚

支持信息

2 回答 2

Related

Reference