0

我在同一个 Google Cloud 项目中有两个 GKE 集群,但是使用相同的 PV/PVC YAML,一个集群可以成功挂载 Filestore 实例,而另一个集群失败。失败的 GKE 集群事件如下所示:

Event  : Pod [podname] Unable to attach or mount volumes: unmounted volumes=[nfs-pv], unattached volumes=[nfs-pv]: timed out waiting for the condition   FailedMount
Event  : Pod [podname] MountVolume.SetUp failed for volume "nfs-pv" : mount failed: exit status 1

挂载失败的 Kublet 日志:

pod_workers.go:191] Error syncing pod [guid] ("[podname](guid)"), skipping: unmounted volumes=[nfs-pv], unattached volumes=[nfs-pv]: timed out waiting for the condition
kubelet.go:1622] Unable to attach or mount volumes for pod "podname(guid)": unmounted volumes=[nfs-pv], unattached volumes=[nfs-pv]: timed out waiting for the condition; skipping pod"
mount_linux.go:150] Mount failed: exit status 1
Output: Running scope as unit: run-r1fb543aa9a9246e0be396dd93bb424f6.scope
Mount failed: mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/c61546e6-9769-4e16-bd0b-c73f904272aa/volumes/kubernetes.io~nfs/nfs-pv --scope -- /home/kubernetes/containerized_mounter/mounter mount -t nfs 192.168.99.2:/mount /var/lib/kubelet/pods/c61546e6-9769-4e16-bd0b-c73f904272aa/volumes/kubernetes.io~nfs/nfs-pv
Output: mount.nfs: Connection timed out
Mounting command: chroot
Mounting arguments: [/home/kubernetes/containerized_mounter/rootfs mount -t nfs 192.168.99.2:/mount /var/lib/kubelet/pods/c61546e6-9769-4e16-bd0b-c73f904272aa/volumes/kubernetes.io~nfs/nfs-pv]
nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/nfs/c61546e6-9769-4e16-bd0b-c73f904272aa-nfs-pv podName:c61546e6-9769-4e16-bd0b-c73f904272aa nodeName:}" failed. No retries permitted until 2021-09-11 10:01:44.725959505 +0000 UTC m=+820955.435941160 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for volume \"nfs-pv\" (UniqueName: \"kubernetes.io/nfs/c61546e6-9769-4e16-bd0b-c73f904272aa-nfs-pv\") pod \"podname\" (UID: \"c61546e6-9769-4e16-bd0b-c73f904272aa\") : mount failed: exit status 1\nMounting command: systemd-run\nMounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/c61546e6-9769-4e16-bd0b-c73f904272aa/volumes/kubernetes.io~nfs/nfs-pv --scope -- /home/kubernetes/containerized_mounter/mounter mount -t nfs 192.168.99.2:/mount /var/lib/kubelet/pods/c61546e6-9769-4e16-bd0b-c73f904272aa/volumes/kubernetes.io~nfs/nfs-pv\nOutput: Running scope as unit: run-r1fb543aa9a9246e0be396dd93bb424f6.scope\nMount failed: mount failed: exit status 32\nMounting command: chroot\nMounting arguments: [/home/kubernetes/containerized_mounter/rootfs mount -t nfs 192.168.99.2:/mount /var/lib/kubelet/pods/c61546e6-9769-4e16-bd0b-c73f904272aa/volumes/kubernetes.io~nfs/nfs-pv]\nOutput: mount.nfs: Connection timed out\n"

也许一个线索是这两个集群位于不同的区域和不同的子网中?为什么 Filestore 可以连接到一个集群而不能连接到其他集群?

4

1 回答 1

1

经过很多很多小时的调试,我终于找到了答案。基于:https ://groups.google.com/g/google-cloud-filestore-discuss/c/wKTT6hEzk08

失败的 GKE 集群位于 RFC1918 之外的子网中,因此不被接受为 Filestore 客户端。一旦我们将子网更改为有效的 RFC1918,两个 GKE 集群都能够成功挂载 Filestore 实例。

这非常令人沮丧,因为 RFC1918 要求在文档或故障排除中并不明确 - 事实上,其他 Google Cloud 服务在无效的 RFC1918 子网中运行良好。

于 2021-09-15T19:13:29.787 回答