service - Kubernetes 服务有时无法访问

Question

我有一个集群 kubernetes v1.5.2 安装kops并使用weave网络插件。我注意到有时我的 kubernetes 服务无法从集群上的 pod 中访问。

我浏览了有关故障排除服务的整篇文章：https ://kubernetes.io/docs/admin/cluster-troubleshooting/我可以确认一切都按预期执行，但有时并非如此（这是来自集群尝试使用其 IP 地址访问服务。该服务由 5 个端点支持，全部启动并运行）：

$> curl 100.65.135.200 -vv
* Rebuilt URL to: 100.65.135.200/
*   Trying 100.65.135.200...
* connect to 100.65.135.200 port 80 failed: No route to host
* Failed to connect to 100.65.135.200 port 80: No route to host
* Closing connection 0
curl: (7) Failed to connect to 100.65.135.200 port 80: No route to host

那是我第一次使用kopsand设置集群，weave也是我第一次看到这个。如果有人有调试这个的线索，那就太棒了！

更新

kube 代理正在正确注册我的服务：I0210 23:09:41.070508 6 proxier.go:472] Adding new service "my_app/my_app:http" at 100.65.135.200:80/TCP
我的 pod IP 不与集群的 IP 重叠

weave-kube但是，我在集群的 2 个节点上的容器上看到了一些奇怪的日志：

INFO: 2017/02/11 12:14:10.959122 Discovered remote MAC b2:3e:c7:99:16:de at ce:7d:9f:95:66:fb(ip-172-20-55-245)
ERRO: 2017/02/11 12:14:10.959348 Captured frame from MAC (b2:3e:c7:99:16:de) to (ff:ff:ff:ff:ff:ff) associated with another peer ce:7d:9f:95:66:fb(ip-172-20-55-245)
ERRO: 2017/02/11 12:14:39.140186 Captured frame from MAC (06:b7:eb:e7:fa:0e) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:15:52.273667 Captured frame from MAC (32:f9:43:24:68:ad) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:16:56.686643 Captured frame from MAC (c2:58:a0:4e:b2:ff) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:16:56.686969 Captured frame from MAC (ce:7d:9f:95:66:fb) to (ff:ff:ff:ff:ff:ff) associated with another peer ce:7d:9f:95:66:fb(ip-172-20-55-245)
ERRO: 2017/02/11 12:16:56.687002 Captured frame from MAC (72:85:2b:19:65:b9) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:16:56.687042 Captured frame from MAC (f2:1a:9e:d8:7f:a3) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)

要调查这个

更新 2

所以这些编织错误是我的问题。显然 ethtool 是 weave 所需要的，我的图像中缺少它。我将 AMI 更新到 1.5，现在一切都按预期工作。

score 0 · Accepted Answer

一切都按预期执行，但有时并非如此

获得更多细节来描述这一点会很好 - 是一个 Pod 失败而其他 Pod 工作，还是所有 Pod 有时工作有时失败？

但是，还有一些额外的事情需要检查：

您的虚拟以太网设备是否与网桥断开连接？见https://github.com/weaveworks/weave/issues/2601
您的 pod IP 地址空间是否与集群 IP 地址空间重叠？
检查 100.65.135.200 是否由 kube-proxy 映射（该部分在https://kubernetes.io/docs/admin/cluster-troubleshooting/中有描述）

最后一步是查看网络数据包——tcpdump -n -i weave在运行测试的同时运行curl；如果你在那里看不到任何东西，那么在 pod 的 veth 上运行转储。

service - Kubernetes 服务有时无法访问

更新

更新 2

1 回答 1

Related

Reference