前段时间,我在单节点k3s集群上用rook创建了一个ceph集群,就是为了试一试,效果很好。我能够通过 cephfs 为其他 pod 提供存储空间。我按照 rook quickstart 文档中给出的示例来执行此操作。
然而,两天前,在我没有任何干预的情况下,ceph 集群停止了工作。ceph manager pod 似乎有一个问题:我的 podrook-ceph-mgr-a-6447569f69-5prdw
在循环中崩溃,这是它的事件:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 41m (x888 over 6h5m) kubelet, localhost Back-off restarting failed container
Warning Unhealthy 36m (x234 over 6h14m) kubelet, localhost Liveness probe failed: Get http://10.42.0.163:9283/: dial tcp 10.42.0.163:9283: connect: connection refused
Warning FailedMount 31m (x2 over 31m) kubelet, localhost MountVolume.SetUp failed for volume "rook-ceph-mgr-a-keyring" : failed to sync secret cache: timed out waiting for the condition
Warning FailedMount 31m (x2 over 31m) kubelet, localhost MountVolume.SetUp failed for volume "rook-ceph-mgr-token-bf88n" : failed to sync secret cache: timed out waiting for the condition
Warning FailedMount 31m (x2 over 31m) kubelet, localhost MountVolume.SetUp failed for volume "rook-config-override" : failed to sync configmap cache: timed out waiting for the condition
Normal Killing 28m (x2 over 30m) kubelet, localhost Container mgr failed liveness probe, will be restarted
Normal Pulled 28m (x3 over 31m) kubelet, localhost Container image "ceph/ceph:v14.2.7" already present on machine
Normal Created 28m (x3 over 31m) kubelet, localhost Created container mgr
Normal Started 28m (x3 over 31m) kubelet, localhost Started container mgr
Warning BackOff 6m47s (x50 over 22m) kubelet, localhost Back-off restarting failed container
Warning Unhealthy 63s (x28 over 30m) kubelet, localhost Liveness probe failed: Get http://10.42.0.163:9283/: dial tcp 10.42.0.163:9283: connect: connection refused
不知道failed to sync secret cache
是原因还是结果。是车问题还是k3s问题?
没有输出k3s kubectl logs rook-ceph-mgr-a-6447569f69-5prdw -n rook-ceph
(添加 -p 没有任何改变)
谢谢你的帮助,这是我关于stackoverflow的第一个问题,希望它是正确的:)