我有一个集群 kubernetes v1.5.2 安装kops
并使用weave
网络插件。我注意到有时我的 kubernetes 服务无法从集群上的 pod 中访问。
我浏览了有关故障排除服务的整篇文章:https ://kubernetes.io/docs/admin/cluster-troubleshooting/我可以确认一切都按预期执行,但有时并非如此(这是来自集群尝试使用其 IP 地址访问服务。该服务由 5 个端点支持,全部启动并运行):
$> curl 100.65.135.200 -vv
* Rebuilt URL to: 100.65.135.200/
* Trying 100.65.135.200...
* connect to 100.65.135.200 port 80 failed: No route to host
* Failed to connect to 100.65.135.200 port 80: No route to host
* Closing connection 0
curl: (7) Failed to connect to 100.65.135.200 port 80: No route to host
那是我第一次使用kops
and设置集群,weave
也是我第一次看到这个。如果有人有调试这个的线索,那就太棒了!
更新
kube 代理正在正确注册我的服务:
I0210 23:09:41.070508 6 proxier.go:472] Adding new service "my_app/my_app:http" at 100.65.135.200:80/TCP
我的 pod IP 不与集群的 IP 重叠
weave-kube
但是,我在集群的 2 个节点上的容器上看到了一些奇怪的日志:
INFO: 2017/02/11 12:14:10.959122 Discovered remote MAC b2:3e:c7:99:16:de at ce:7d:9f:95:66:fb(ip-172-20-55-245)
ERRO: 2017/02/11 12:14:10.959348 Captured frame from MAC (b2:3e:c7:99:16:de) to (ff:ff:ff:ff:ff:ff) associated with another peer ce:7d:9f:95:66:fb(ip-172-20-55-245)
ERRO: 2017/02/11 12:14:39.140186 Captured frame from MAC (06:b7:eb:e7:fa:0e) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:15:52.273667 Captured frame from MAC (32:f9:43:24:68:ad) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:16:56.686643 Captured frame from MAC (c2:58:a0:4e:b2:ff) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:16:56.686969 Captured frame from MAC (ce:7d:9f:95:66:fb) to (ff:ff:ff:ff:ff:ff) associated with another peer ce:7d:9f:95:66:fb(ip-172-20-55-245)
ERRO: 2017/02/11 12:16:56.687002 Captured frame from MAC (72:85:2b:19:65:b9) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:16:56.687042 Captured frame from MAC (f2:1a:9e:d8:7f:a3) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
要调查这个
更新 2
所以这些编织错误是我的问题。显然 ethtool 是 weave 所需要的,我的图像中缺少它。我将 AMI 更新到 1.5,现在一切都按预期工作。