Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-37088

[AWS-EBS-CSI-Driver] allocatable volumes count incorrect in csinode for AWS vt1*/g4* instance types

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.17, 4.16.z
    • Storage
    • Important
    • None
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      Description of problem:

      [AWS-EBS-CSI-Driver] allocatable volumes count incorrect in csinode for AWS vt1*/g4* instance types    

      Version-Release number of selected component (if applicable):

       4.17.0-0.nightly-2024-07-16-033047   

      How reproducible:

       Always

      Steps to Reproduce:

      1. Use instance type "vt1.3xlarge"/"g4ad.xlarge"/"g4dn.xlarge" install Openshift cluster on AWS
      
      2. Check the csinode allocatable volumes count 
      $ oc get csinode ip-10-0-53-225.ec2.internal -ojsonpath='{.spec.drivers[?(@.name=="ebs.csi.aws.com")].allocatable.count}'
      26
      
      g4ad.xlarge # 25 
      g4dn.xlarge # 25
      vt1.3xlarge # 26                                                              
      
      $ oc get no/ip-10-0-53-225.ec2.internal -oyaml| grep 'instance-type'
          beta.kubernetes.io/instance-type: vt1.3xlarge
          node.kubernetes.io/instance-type: vt1.3xlarge
      3. Create statefulset with pvc(which use the ebs csi storageclass), nodeAnffinity to the same node and set the replicas to the max volumesallocatable count to verify the the csinode allocatable volumes count is correct and all the pods should become Running 
      
      # Test data
      apiVersion: apps/v1
      kind: StatefulSet
      metadata:
        name: statefulset-vol-limit
      spec:
        serviceName: "my-svc"
        replicas: 26
        selector:
          matchLabels:
            app: my-svc
        template:
          metadata:
            labels:
              app: my-svc
          spec:
            affinity:
              nodeAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  nodeSelectorTerms:
                  - matchExpressions:
                    - key: kubernetes.io/hostname
                      operator: In
                      values:
                      - ip-10-0-53-225.ec2.internal # Make all volume attach to the same node
            containers:
            - name: openshifttest
              image: quay.io/openshifttest/hello-openshift@sha256:56c354e7885051b6bb4263f9faa58b2c292d44790599b7dde0e49e7c466cf339
              volumeMounts:
              - name: data
                mountPath: /mnt/storage
            tolerations:
              - key: "node-role.kubernetes.io/master"
                effect: "NoSchedule"
        volumeClaimTemplates:
        - metadata:
            name: data
          spec:
            accessModes: [ "ReadWriteOnce" ]
            #storageClassName: gp3-csi
            resources:
              requests:
                storage: 1Gi

      Actual results:

      In step 3 there's some pods stuck at "ContainerCreating" status caused by volumes stuck at attaching status and couldn't be attached to the node    

      Expected results:

       In step 3 all the pods with pvc should become "Running", and In step 2 the csinode allocatable volumes count should be correct
      
      -> g4ad.xlarge allocatable count should be 24
      -> g4dn.xlarge allocatable count should be 24
      -> vt1.3xlarge allocatable count should be 24   

      Additional info:

        ...
      attach or mount volumes: unmounted volumes=[data12 data6], unattached volumes=[data12 data6], failed to process volumes=[]: timed out waiting for the condition
      06-25 17:51:23.680      Warning  FailedAttachVolume      4m1s (x13 over 14m)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-d08d4133-f589-4aa3-bbef-f988058c419a" : rpc error: code = Internal desc = Could not attach volume "vol-0aa138f453d414ec3" to node "i-09d532f5155b3c05d": attachment of disk "vol-0aa138f453d414ec3" failed, expected device to be attached but was attaching
      06-25 17:51:23.681      Warning  FailedMount             3m40s (x3 over 10m)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[data6 data12], unattached volumes=[data12 data6], failed to process volumes=[]: timed out waiting for the condition
      ...  

              rh-ee-mpatlaso Maxim Patlasov
              rhn-support-pewang Penghao Wang
              Penghao Wang Penghao Wang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: