2

我有一个集群 kubernetes v1.5.2 安装kops并使用weave网络插件。我注意到有时我的 kubernetes 服务无法从集群上的 pod 中访问。

我浏览了有关故障排除服务的整篇文章:https ://kubernetes.io/docs/admin/cluster-troubleshooting/我可以确认一切都按预期执行,但有时并非如此(这是来自集群尝试使用其 IP 地址访问服务。该服务由 5 个端点支持,全部启动并运行):

$> curl 100.65.135.200 -vv
* Rebuilt URL to: 100.65.135.200/
*   Trying 100.65.135.200...
* connect to 100.65.135.200 port 80 failed: No route to host
* Failed to connect to 100.65.135.200 port 80: No route to host
* Closing connection 0
curl: (7) Failed to connect to 100.65.135.200 port 80: No route to host

那是我第一次使用kopsand设置集群,weave也是我第一次看到这个。如果有人有调试这个的线索,那就太棒了!

更新

  • kube 代理正在正确注册我的服务:I0210 23:09:41.070508 6 proxier.go:472] Adding new service "my_app/my_app:http" at 100.65.135.200:80/TCP

  • 我的 pod IP 不与集群的 IP 重叠

weave-kube但是,我在集群的 2 个节点上的容器上看到了一些奇怪的日志:

INFO: 2017/02/11 12:14:10.959122 Discovered remote MAC b2:3e:c7:99:16:de at ce:7d:9f:95:66:fb(ip-172-20-55-245)
ERRO: 2017/02/11 12:14:10.959348 Captured frame from MAC (b2:3e:c7:99:16:de) to (ff:ff:ff:ff:ff:ff) associated with another peer ce:7d:9f:95:66:fb(ip-172-20-55-245)
ERRO: 2017/02/11 12:14:39.140186 Captured frame from MAC (06:b7:eb:e7:fa:0e) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:15:52.273667 Captured frame from MAC (32:f9:43:24:68:ad) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:16:56.686643 Captured frame from MAC (c2:58:a0:4e:b2:ff) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:16:56.686969 Captured frame from MAC (ce:7d:9f:95:66:fb) to (ff:ff:ff:ff:ff:ff) associated with another peer ce:7d:9f:95:66:fb(ip-172-20-55-245)
ERRO: 2017/02/11 12:16:56.687002 Captured frame from MAC (72:85:2b:19:65:b9) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:16:56.687042 Captured frame from MAC (f2:1a:9e:d8:7f:a3) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)

要调查这个

更新 2

所以这些编织错误是我的问题。显然 ethtool 是 weave 所需要的,我的图像中缺少它。我将 AMI 更新到 1.5,现在一切都按预期工作。

4

1 回答 1

0

一切都按预期执行,但有时并非如此

获得更多细节来描述这一点会很好 - 是一个 Pod 失败而其他 Pod 工作,还是所有 Pod 有时工作有时失败?

但是,还有一些额外的事情需要检查:

  1. 您的虚拟以太网设备是否与网桥断开连接?见https://github.com/weaveworks/weave/issues/2601
  2. 您的 pod IP 地址空间是否与集群 IP 地址空间重叠?
  3. 检查 100.65.135.200 是否由 kube-proxy 映射(该部分在https://kubernetes.io/docs/admin/cluster-troubleshooting/中有描述)

最后一步是查看网络数据包——tcpdump -n -i weave在运行测试的同时运行curl;如果你在那里看不到任何东西,那么在 pod 的 veth 上运行转储。

于 2017-02-11T11:43:19.787 回答