Description of problem:
The EFS operator driver image does not have botocore and can not fall back to DescribeMountTargets when the file system DNS resolution fails.
Detailed Analysis on Customer's Issue:
1. Multiple pods in the customer's namespace are ContainerCreating:
NAME READY STATUS RESTARTS AGE daily-data-68fc75d444-l82mz 1/1 Running 0 2d daily-jupyter-559d87d94b-8qc87 1/1 Running 0 2d daily-queuechart-0 1/1 Running 0 2d daily-worker-2cgtx-wjqzb 0/1 ContainerCreating 0 1d daily-worker-2qnjz-9m5xt 0/1 Error 0 2d daily-worker-2rq2s-rmxd8 0/1 Error 0 2d daily-worker-428fm-2cvhs 0/1 Error 0 2d daily-worker-44x57-9kgfz 0/1 ContainerCreating 0 1d daily-worker-4fwxq-mcchh 0/1 ContainerCreating 0 1d daily-worker-4gtch-28r7w 0/1 Error 0 2d daily-worker-4m2kk-6g86v 0/1 ContainerCreating 0 1d daily-worker-4nfgk-bcrk5 0/1 Error 0 2d daily-worker-5694r-cnkkc 0/1 Error 0 2d daily-worker-5ckd5-kc9xp 0/1 Error 0 2d daily-worker-5dc97-gqh6v 0/1 Error 0 2d daily-worker-5k85s-kcvxq 0/1 ContainerCreating 0 1d
2. Events from the Namespace:
1m46s Warning FailedMount pod/daily-worker-44x57-9kgfz MountVolume.SetUp failed for volume "pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60" : rpc error: code = Internal desc = Could not mount "fs-0a46c02d15de3e0cf:/" at "/var/lib/kubelet/pods/98a515b9-3aeb-4904-b998-aedfe9c668dc/volumes/kubernetes.io~csi/pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60/mount": mount failed: exit status 1... 2h8m Warning FailedMount pod/daily-worker-5k85s-kcvxq MountVolume.SetUp failed for volume "pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60" : rpc error: code = Internal desc = Could not mount "fs-0a46c02d15de3e0cf:/" at "/var/lib/kubelet/pods/25d5cfd8-3ce6-4953-8c1f-81c121053931/volumes/kubernetes.io~csi/pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60/mount": mount failed: exit status 1... 10m Warning FailedMount pod/daily-worker-5m468-99lsh MountVolume.SetUp failed for volume "pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60" : rpc error: code = Internal desc = Could not mount "fs-0a46c02d15de3e0cf:/" at "/var/lib/kubelet/pods/c98ac576-621f-4acf-b21f-dbaf9d744d09/volumes/kubernetes.io~csi/pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60/mount": mount failed: exit status 1... 6m Warning FailedMount pod/daily-worker-5x5xg-8mtjw MountVolume.SetUp failed for volume "pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60" : rpc error: code = Internal desc = Could not mount "fs-0a46c02d15de3e0cf:/" at "/var/lib/kubelet/pods/c4bdda30-957b-4cad-8bcc-17e1dfc7363e/volumes/kubernetes.io~csi/pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60/mount": mount failed: exit status 1... 2h19m Warning FailedMount pod/daily-worker-77k7m-x8zh8 MountVolume.SetUp failed for volume "pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60" : rpc error: code = Internal desc = Could not mount "fs-0a46c02d15de3e0cf:/" at "/var/lib/kubelet/pods/0a40c1bc-ee85-4b59-998e-665a1125ab2f/volumes/kubernetes.io~csi/pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60/mount": mount failed: exit status 1...
3. If you look closely at these events, all the different pods are trying to utilise a single PVC which is coming from the same backend filesystem running on AWS. (Important to note down the PVC Name pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60, Filesystem-ID fs-0a46c02d15de3e0cf, as we will compare it later).
4. The above PVC is coming from aws-efs StorageClass, which is again created through aws-efs-csi-driver-operator.
5. The aws-efs-csi-driver-node-hf4f2 Pod is reporting the error related to botocore.
containerStatuses: containerID: cri-o://641e41ffdff8cf7e5a7f137363cae74932bb8bf46581a784854ae1d6186b79d2 image: registry.redhat.io/openshift4/ose-aws-efs-csi-driver-container-rhel9@sha256:308a43c79cbf28981dc8f5688643c5ecf3a191300e48c701058a465e02be795e imageID: registry.redhat.io/openshift4/ose-aws-efs-csi-driver-container-rhel9@sha256:308a43c79cbf28981dc8f5688643c5ecf3a191300e48c701058a465e02be795e lastState: terminated: containerID: cri-o://b43d65913e1a99e87744cc7ca22263dbaac88a037cf0bae4b445114fcf5d51b6 exitCode: 137 finishedAt: "2025-05-12T08:13:46Z" message: | botocore. Failed to import necessary dependency botocore, please install botocore first. W0512 08:05:24.437892 1 reaper.go:105] reaper: failed to wait for process &{4153 21 90 4153 4153 stunnel}: no child processes E0512 08:09:45.196348 1 mount_linux.go:231] Mount failed: exit status 1 Mounting command: mount Mounting arguments: -t efs -o accesspoint=fsap-085c440a597f9f7ab,tls fs-0a46c02d15de3e0cf:/ /var/lib/kubelet/pods/ba47340f-db9f-489c-bde2-747ccf1c7c69/volumes/kubernetes.io~csi/pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60/mount Output: Failed to resolve "fs-0a46c02d15de3e0cf.efs.ap-east-1.amazonaws.com" - check that your file system ID is correct, and ensure that the VPC has an EFS mount target for this file system ID. See https://docs.aws.amazon.com/console/efs/mount-dns-name for more detail. Attempting to lookup mount target ip address using botocore. Failed to import necessary dependency botocore, please install botocore first. E0512 08:09:45.196433 1 driver.go:107] GRPC error: rpc error: code = Internal desc = Could not mount "fs-0a46c02d15de3e0cf:/" at "/var/lib/kubelet/pods/ba47340f-db9f-489c-bde2-747ccf1c7c69/volumes/kubernetes.io~csi/pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60/mount": mount failed: exit status 1 Mounting command: mount Mounting arguments: -t efs -o accesspoint=fsap-085c440a597f9f7ab,tls fs-0a46c02d15de3e0cf:/ /var/lib/kubelet/pods/ba47340f-db9f-489c-bde2-747ccf1c7c69/volumes/kubernetes.io~csi/pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60/mount Output: Failed to resolve "fs-0a46c02d15de3e0cf.efs.ap-east-1.amazonaws.com" - check that your file system ID is correct, and ensure that the VPC has an EFS mount target for this file system ID. See https://docs.aws.amazon.com/console/efs/mount-dns-name for more detail. Attempting to lookup mount target ip address using botocore. Failed to import necessary dependency botocore, please install botocore first. W0512 08:09:45.199265 1 reaper.go:105] reaper: failed to wait for process &{4153 21 90 4153 4153 stunnel}: no child processes reason: Error startedAt: "2025-05-12T04:52:51Z" name: csi-driver ready: true restartCount: 38 started: true state: running: startedAt: "2025-05-12T08:13:46Z"
6. If we closely look at the above error, look at this specific entry below from the above snippet:
Mounting command: mount Mounting arguments: -t efs -o accesspoint=fsap-085c440a597f9f7ab,tls fs-0a46c02d15de3e0cf:/ /var/lib/kubelet/pods/ba47340f-db9f-489c-bde2-747ccf1c7c69/volumes/kubernetes.io~csi/pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60/mount Output: Failed to resolve "fs-0a46c02d15de3e0cf.efs.ap-east-1.amazonaws.com" - check that your file system ID is correct, and ensure that the VPC has an EFS mount target for this file system ID. See https://docs.aws.amazon.com/console/efs/mount-dns-name for more details. Attempting to lookup mount target ip address using botocore. Failed to import necessary dependency botocore, please install botocore first.
7. We can see that the Mounting arguments are mentioning the same PVC Name pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60 and Filesystem-ID fs-0a46c02d15de3e0cf which we have seen in the error events above related to the pod not being able to mount the volumes, and it is failing to find mount targets.
- depends on
-
STOR-2365 AWS EFS Zonal volume support
-
- Release Pending
-
-
OCPSTRAT-2137 AWS EFS Zonal volume support
-
- Release Pending
-