我在同一个 Google Cloud 项目中有两个 GKE 集群,但是使用相同的 PV/PVC YAML,一个集群可以成功挂载 Filestore 实例,而另一个集群失败。失败的 GKE 集群事件如下所示:
Event : Pod [podname] Unable to attach or mount volumes: unmounted volumes=[nfs-pv], unattached volumes=[nfs-pv]: timed out waiting for the condition FailedMount
Event : Pod [podname] MountVolume.SetUp failed for volume "nfs-pv" : mount failed: exit status 1
挂载失败的 Kublet 日志:
pod_workers.go:191] Error syncing pod [guid] ("[podname](guid)"), skipping: unmounted volumes=[nfs-pv], unattached volumes=[nfs-pv]: timed out waiting for the condition
kubelet.go:1622] Unable to attach or mount volumes for pod "podname(guid)": unmounted volumes=[nfs-pv], unattached volumes=[nfs-pv]: timed out waiting for the condition; skipping pod"
mount_linux.go:150] Mount failed: exit status 1
Output: Running scope as unit: run-r1fb543aa9a9246e0be396dd93bb424f6.scope
Mount failed: mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/c61546e6-9769-4e16-bd0b-c73f904272aa/volumes/kubernetes.io~nfs/nfs-pv --scope -- /home/kubernetes/containerized_mounter/mounter mount -t nfs 192.168.99.2:/mount /var/lib/kubelet/pods/c61546e6-9769-4e16-bd0b-c73f904272aa/volumes/kubernetes.io~nfs/nfs-pv
Output: mount.nfs: Connection timed out
Mounting command: chroot
Mounting arguments: [/home/kubernetes/containerized_mounter/rootfs mount -t nfs 192.168.99.2:/mount /var/lib/kubelet/pods/c61546e6-9769-4e16-bd0b-c73f904272aa/volumes/kubernetes.io~nfs/nfs-pv]
nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/nfs/c61546e6-9769-4e16-bd0b-c73f904272aa-nfs-pv podName:c61546e6-9769-4e16-bd0b-c73f904272aa nodeName:}" failed. No retries permitted until 2021-09-11 10:01:44.725959505 +0000 UTC m=+820955.435941160 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for volume \"nfs-pv\" (UniqueName: \"kubernetes.io/nfs/c61546e6-9769-4e16-bd0b-c73f904272aa-nfs-pv\") pod \"podname\" (UID: \"c61546e6-9769-4e16-bd0b-c73f904272aa\") : mount failed: exit status 1\nMounting command: systemd-run\nMounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/c61546e6-9769-4e16-bd0b-c73f904272aa/volumes/kubernetes.io~nfs/nfs-pv --scope -- /home/kubernetes/containerized_mounter/mounter mount -t nfs 192.168.99.2:/mount /var/lib/kubelet/pods/c61546e6-9769-4e16-bd0b-c73f904272aa/volumes/kubernetes.io~nfs/nfs-pv\nOutput: Running scope as unit: run-r1fb543aa9a9246e0be396dd93bb424f6.scope\nMount failed: mount failed: exit status 32\nMounting command: chroot\nMounting arguments: [/home/kubernetes/containerized_mounter/rootfs mount -t nfs 192.168.99.2:/mount /var/lib/kubelet/pods/c61546e6-9769-4e16-bd0b-c73f904272aa/volumes/kubernetes.io~nfs/nfs-pv]\nOutput: mount.nfs: Connection timed out\n"
也许一个线索是这两个集群位于不同的区域和不同的子网中?为什么 Filestore 可以连接到一个集群而不能连接到其他集群?