Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-31101

[AWS-EBS-CSI-Driver] allocatable volumes count incorrect in csinode for AWS arm instance types "c7gd.2xlarge , m7gd.xlarge"

XMLWordPrintable

    • Important
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      [AWS-EBS-CSI-Driver] allocatable volumes count incorrect in csinode for AWS arm instance types "c7gd.2xlarge , m7gd.xlarge"

      Version-Release number of selected component (if applicable):

          4.15.3

      How reproducible:

          Always

      Steps to Reproduce:

          1. Create an Openshift cluster on AWS with intance types "c7gd.2xlarge , m7gd.xlarge"
          2. Check the csinode allocatable volumes count 
          3. Create statefulset with 1 pvc mounted and max allocatable volumes count replicas with nodeAffinity 
          apiVersion: apps/v1
      kind: StatefulSet
      metadata:
        name: statefulset-vol-limit
      spec:
        serviceName: "my-svc"
        replicas: $VOL_COUNT_LIMIT
        selector:
          matchLabels:
            app: my-svc
        template:
          metadata:
            labels:
              app: my-svc
          spec:
            affinity:
              nodeAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  nodeSelectorTerms:
                  - matchExpressions:
                    - key: kubernetes.io/hostname
                      operator: In
                      values:
                      - $NODE_NAME
            containers:
            - name: openshifttest
              image: quay.io/openshifttest/hello-openshift@sha256:56c354e7885051b6bb4263f9faa58b2c292d44790599b7dde0e49e7c466cf339
              volumeMounts:
              - name: data
                mountPath: /mnt/storage
            tolerations:
              - key: "node-role.kubernetes.io/master"
                effect: "NoSchedule"
        volumeClaimTemplates:
        - metadata:
            name: doc gata
          spec:
            accessModes: [ "ReadWriteOnce" ]
            storageClassName: gp3-csi
            resources:
              requests:
                storage: 1Gi
          4. The statefulset all replicas should all become ready.

      Actual results:

      In step 4, the statefulset 26th replica(pod) stuck at ContainerCreating caused by the volume couldn't be attached to the node(the csinode allocatable volumes count incorrect) 
      $ oc get no/ip-10-0-22-114.ec2.internal -oyaml|grep 'instance'
          beta.kubernetes.io/instance-type: m7gd.xlarge
          node.kubernetes.io/instance-type: m7gd.xlarge
       $ oc get csinode/ip-10-0-22-114.ec2.internal -oyaml
      apiVersion: storage.k8s.io/v1
      kind: CSINode
      metadata:
        annotations:
          storage.alpha.kubernetes.io/migrated-plugins: kubernetes.io/aws-ebs,kubernetes.io/azure-disk,kubernetes.io/azure-file,kubernetes.io/cinder,kubernetes.io/gce-pd,kubernetes.io/vsphere-volume
        creationTimestamp: "2024-03-20T02:16:34Z"
        name: ip-10-0-22-114.ec2.internal
        ownerReferences:
        - apiVersion: v1
          kind: Node
          name: ip-10-0-22-114.ec2.internal
          uid: acb9a153-bb9b-4c4a-90c1-f3e095173ce2
        resourceVersion: "19281"
        uid: 12507a73-898d-441a-a844-41c7de290b5b
      spec:
        drivers:
        - allocatable:
            count: 26
          name: ebs.csi.aws.com
          nodeID: i-00ec014c5676a99d2
          topologyKeys:
          - topology.ebs.csi.aws.com/zone
      $ export VOL_COUNT_LIMIT="26"
      $ export NODE_NAME="ip-10-0-22-114.ec2.internal"
      $ envsubst < sts-vol-limit.yaml| oc apply -f -
      statefulset.apps/statefulset-vol-limit created
      $ oc get sts
      NAME                    READY   AGE
      statefulset-vol-limit   25/26   169m
      
      $ oc describe po/statefulset-vol-limit-25
      Name:             statefulset-vol-limit-25
      Namespace:        default
      Priority:         0
      Service Account:  default
      Node:             ip-10-0-22-114.ec2.internal/10.0.22.114
      Start Time:       Wed, 20 Mar 2024 18:56:08 +0800
      Labels:           app=my-svc
                        apps.kubernetes.io/pod-index=25
                        controller-revision-hash=statefulset-vol-limit-7db55989f7
                        statefulset.kubernetes.io/pod-name=statefulset-vol-limit-25
      Annotations:      k8s.ovn.org/pod-networks:
                          {"default":{"ip_addresses":["10.128.2.53/23"],"mac_address":"0a:58:0a:80:02:35","gateway_ips":["10.128.2.1"],"routes":[{"dest":"10.128.0.0...
      Status:           Pending
      IP:
      IPs:              <none>
      Controlled By:    StatefulSet/statefulset-vol-limit
      Containers:
        openshifttest:
          Container ID:
          Image:          quay.io/openshifttest/hello-openshift@sha256:56c354e7885051b6bb4263f9faa58b2c292d44790599b7dde0e49e7c466cf339
          Image ID:
          Port:           <none>
          Host Port:      <none>
          State:          Waiting
            Reason:       ContainerCreating
          Ready:          False
          Restart Count:  0
          Environment:    <none>
          Mounts:
            /mnt/storage from data (rw)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zkwqx (ro)
      Conditions:
        Type              Status
        Initialized       True
        Ready             False
        ContainersReady   False
        PodScheduled      True
      Volumes:
        data:
          Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
          ClaimName:  data-statefulset-vol-limit-25
          ReadOnly:   false
        kube-api-access-zkwqx:
          Type:                    Projected (a volume that contains injected data from multiple sources)
          TokenExpirationSeconds:  3607
          ConfigMapName:           kube-root-ca.crt
          ConfigMapOptional:       <nil>
          DownwardAPI:             true
          ConfigMapName:           openshift-service-ca.crt
          ConfigMapOptional:       <nil>
      QoS Class:                   BestEffort
      Node-Selectors:              <none>
      Tolerations:                 node-role.kubernetes.io/master:NoSchedule
                                   node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                                   node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
      Events:
        Type     Reason              Age                  From                     Message
        ----     ------              ----                 ----                     -------
        Normal   Scheduled           167m                 default-scheduler        Successfully assigned default/statefulset-vol-limit-25 to ip-10-0-22-114.ec2.internal
        Warning  FailedAttachVolume  166m (x2 over 166m)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-b43ec1d0-4fa3-4e87-a80b-6ad912160273" : rpc error: code = Internal desc = Could not attach volume "vol-0a7cb8c5859cf3f96" to node "i-00ec014c5676a99d2": context deadline exceeded
        Warning  FailedAttachVolume  30s (x87 over 166m)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-b43ec1d0-4fa3-4e87-a80b-6ad912160273" : rpc error: code = Internal desc = Could not attach volume "vol-0a7cb8c5859cf3f96" to node "i-00ec014c5676a99d2": attachment of disk "vol-0a7cb8c5859cf3f96" failed, expected device to be attached but was attaching

      Expected results:

          In step4 The statefulset all replicas should all become ready.

      Additional info:

          The AWS arm instance types "c7gd.2xlarge , m7gd.xlarge" all should be "25" not "26"

            rbednar@redhat.com Roman Bednar
            rhn-support-pewang Penghao Wang
            Wei Duan Wei Duan
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: