Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-56636

AWS EFS CSI Driver Operator capability needed to support zonal volumes

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • None
    • None
    • None
    • Customer Escalated
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      The EFS operator driver image does not have botocore and can not fall back to DescribeMountTargets when the file system DNS resolution fails.

      Detailed Analysis on Customer's Issue:

      1. Multiple pods in the customer's namespace are ContainerCreating:

       

      NAME                             READY   STATUS              RESTARTS   AGE
      daily-data-68fc75d444-l82mz      1/1     Running             0          2d
      daily-jupyter-559d87d94b-8qc87   1/1     Running             0          2d
      daily-queuechart-0               1/1     Running             0          2d
      daily-worker-2cgtx-wjqzb         0/1     ContainerCreating   0          1d
      daily-worker-2qnjz-9m5xt         0/1     Error               0          2d
      daily-worker-2rq2s-rmxd8         0/1     Error               0          2d
      daily-worker-428fm-2cvhs         0/1     Error               0          2d
      daily-worker-44x57-9kgfz         0/1     ContainerCreating   0          1d
      daily-worker-4fwxq-mcchh         0/1     ContainerCreating   0          1d
      daily-worker-4gtch-28r7w         0/1     Error               0          2d
      daily-worker-4m2kk-6g86v         0/1     ContainerCreating   0          1d
      daily-worker-4nfgk-bcrk5         0/1     Error               0          2d
      daily-worker-5694r-cnkkc         0/1     Error               0          2d
      daily-worker-5ckd5-kc9xp         0/1     Error               0          2d
      daily-worker-5dc97-gqh6v         0/1     Error               0          2d
      daily-worker-5k85s-kcvxq         0/1     ContainerCreating   0          1d 

      2. Events from the Namespace:

       

      1m46s       Warning   FailedMount     pod/daily-worker-44x57-9kgfz   MountVolume.SetUp failed for volume "pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60" : rpc error: code = Internal desc = Could not mount "fs-0a46c02d15de3e0cf:/" at "/var/lib/kubelet/pods/98a515b9-3aeb-4904-b998-aedfe9c668dc/volumes/kubernetes.io~csi/pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60/mount": mount failed: exit status 1...
      2h8m        Warning   FailedMount     pod/daily-worker-5k85s-kcvxq   MountVolume.SetUp failed for volume "pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60" : rpc error: code = Internal desc = Could not mount "fs-0a46c02d15de3e0cf:/" at "/var/lib/kubelet/pods/25d5cfd8-3ce6-4953-8c1f-81c121053931/volumes/kubernetes.io~csi/pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60/mount": mount failed: exit status 1...
      10m         Warning   FailedMount     pod/daily-worker-5m468-99lsh   MountVolume.SetUp failed for volume "pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60" : rpc error: code = Internal desc = Could not mount "fs-0a46c02d15de3e0cf:/" at "/var/lib/kubelet/pods/c98ac576-621f-4acf-b21f-dbaf9d744d09/volumes/kubernetes.io~csi/pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60/mount": mount failed: exit status 1...
      6m          Warning   FailedMount     pod/daily-worker-5x5xg-8mtjw   MountVolume.SetUp failed for volume "pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60" : rpc error: code = Internal desc = Could not mount "fs-0a46c02d15de3e0cf:/" at "/var/lib/kubelet/pods/c4bdda30-957b-4cad-8bcc-17e1dfc7363e/volumes/kubernetes.io~csi/pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60/mount": mount failed: exit status 1...
      2h19m       Warning   FailedMount     pod/daily-worker-77k7m-x8zh8   MountVolume.SetUp failed for volume "pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60" : rpc error: code = Internal desc = Could not mount "fs-0a46c02d15de3e0cf:/" at "/var/lib/kubelet/pods/0a40c1bc-ee85-4b59-998e-665a1125ab2f/volumes/kubernetes.io~csi/pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60/mount": mount failed: exit status 1... 

      3. If you look closely at these events, all the different pods are trying to utilise a single PVC which is coming from the same backend filesystem running on AWS. (Important to note down the PVC Name pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60, Filesystem-ID fs-0a46c02d15de3e0cf, as we will compare it later).

      4. The above PVC is coming from aws-efs StorageClass, which is again created through aws-efs-csi-driver-operator.

      5. The aws-efs-csi-driver-node-hf4f2 Pod is reporting the error related to botocore.

      containerStatuses:
      containerID: cri-o://641e41ffdff8cf7e5a7f137363cae74932bb8bf46581a784854ae1d6186b79d2
      image: registry.redhat.io/openshift4/ose-aws-efs-csi-driver-container-rhel9@sha256:308a43c79cbf28981dc8f5688643c5ecf3a191300e48c701058a465e02be795e
      imageID: registry.redhat.io/openshift4/ose-aws-efs-csi-driver-container-rhel9@sha256:308a43c79cbf28981dc8f5688643c5ecf3a191300e48c701058a465e02be795e
      lastState:
        terminated:
          containerID: cri-o://b43d65913e1a99e87744cc7ca22263dbaac88a037cf0bae4b445114fcf5d51b6
          exitCode: 137
          finishedAt: "2025-05-12T08:13:46Z"
          message: |
            botocore. Failed to import necessary dependency botocore, please install botocore first.
            W0512 08:05:24.437892       1 reaper.go:105] reaper: failed to wait for process &{4153 21 90 4153 4153 stunnel}: no child processes
            E0512 08:09:45.196348       1 mount_linux.go:231] Mount failed: exit status 1
            Mounting command: mount
            Mounting arguments: -t efs -o accesspoint=fsap-085c440a597f9f7ab,tls fs-0a46c02d15de3e0cf:/ /var/lib/kubelet/pods/ba47340f-db9f-489c-bde2-747ccf1c7c69/volumes/kubernetes.io~csi/pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60/mount
            Output: Failed to resolve "fs-0a46c02d15de3e0cf.efs.ap-east-1.amazonaws.com" - check that your file system ID is correct, and ensure that the VPC has an EFS mount target for this file system ID.
            See https://docs.aws.amazon.com/console/efs/mount-dns-name for more detail.
            Attempting to lookup mount target ip address using botocore. Failed to import necessary dependency botocore, please install botocore first.
            E0512 08:09:45.196433       1 driver.go:107] GRPC error: rpc error: code = Internal desc = Could not mount "fs-0a46c02d15de3e0cf:/" at "/var/lib/kubelet/pods/ba47340f-db9f-489c-bde2-747ccf1c7c69/volumes/kubernetes.io~csi/pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60/mount": mount failed: exit status 1
            Mounting command: mount
            Mounting arguments: -t efs -o accesspoint=fsap-085c440a597f9f7ab,tls fs-0a46c02d15de3e0cf:/ /var/lib/kubelet/pods/ba47340f-db9f-489c-bde2-747ccf1c7c69/volumes/kubernetes.io~csi/pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60/mount
            Output: Failed to resolve "fs-0a46c02d15de3e0cf.efs.ap-east-1.amazonaws.com" - check that your file system ID is correct, and ensure that the VPC has an EFS mount target for this file system ID.
            See https://docs.aws.amazon.com/console/efs/mount-dns-name for more detail.
            Attempting to lookup mount target ip address using botocore. Failed to import necessary dependency botocore, please install botocore first.
            W0512 08:09:45.199265       1 reaper.go:105] reaper: failed to wait for process &{4153 21 90 4153 4153 stunnel}: no child processes
          reason: Error
          startedAt: "2025-05-12T04:52:51Z"
      name: csi-driver
      ready: true
      restartCount: 38
      started: true
      state:
        running:
          startedAt: "2025-05-12T08:13:46Z" 

      6. If we closely look at the above error, look at this specific entry below from the above snippet:

      Mounting command: mount
      Mounting arguments: -t efs -o accesspoint=fsap-085c440a597f9f7ab,tls fs-0a46c02d15de3e0cf:/ /var/lib/kubelet/pods/ba47340f-db9f-489c-bde2-747ccf1c7c69/volumes/kubernetes.io~csi/pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60/mount
      Output: Failed to resolve "fs-0a46c02d15de3e0cf.efs.ap-east-1.amazonaws.com" - check that your file system ID is correct, and ensure that the VPC has an EFS mount target for this file system ID.
            See https://docs.aws.amazon.com/console/efs/mount-dns-name for more details.
            Attempting to lookup mount target ip address using botocore. Failed to import necessary dependency botocore, please install botocore first. 

      7. We can see that the Mounting arguments are mentioning the same PVC Name pvc-399109c1-c3f5-4767-a74c-2f607a2f0b60 and Filesystem-ID fs-0a46c02d15de3e0cf which we have seen in the error events above related to the pod not being able to mount the volumes, and it is failing to find mount targets.

              Unassigned Unassigned
              rhn-support-pmagotra Priyansh Magotra
              Priyansh Magotra
              None
              Penghao Wang Penghao Wang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: