[AWS-EBS-CSI-Driver] allocatable volumes count incorrect in csinode for AWS arm instance types "c7gd.2xlarge , m7gd.xlarge"
1. Create an Openshift cluster on AWS with intance types "c7gd.2xlarge , m7gd.xlarge" 2. Check the csinode allocatable volumes count 3. Create statefulset with 1 pvc mounted and max allocatable volumes count replicas with nodeAffinity apiVersion: apps/v1 kind: StatefulSet metadata: name: statefulset-vol-limit spec: serviceName: "my-svc" replicas: $VOL_COUNT_LIMIT selector: matchLabels: app: my-svc template: metadata: labels: app: my-svc spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - $NODE_NAME containers: - name: openshifttest image: quay.io/openshifttest/hello-openshift@sha256:56c354e7885051b6bb4263f9faa58b2c292d44790599b7dde0e49e7c466cf339 volumeMounts: - name: data mountPath: /mnt/storage tolerations: - key: "node-role.kubernetes.io/master" effect: "NoSchedule" volumeClaimTemplates: - metadata: name: doc gata spec: accessModes: [ "ReadWriteOnce" ] storageClassName: gp3-csi resources: requests: storage: 1Gi 4. The statefulset all replicas should all become ready.
In step 4, the statefulset 26th replica(pod) stuck at ContainerCreating caused by the volume couldn't be attached to the node(the csinode allocatable volumes count incorrect) $ oc get no/ip-10-0-22-114.ec2.internal -oyaml|grep 'instance' beta.kubernetes.io/instance-type: m7gd.xlarge node.kubernetes.io/instance-type: m7gd.xlarge $ oc get csinode/ip-10-0-22-114.ec2.internal -oyaml apiVersion: storage.k8s.io/v1 kind: CSINode metadata: annotations: storage.alpha.kubernetes.io/migrated-plugins: kubernetes.io/aws-ebs,kubernetes.io/azure-disk,kubernetes.io/azure-file,kubernetes.io/cinder,kubernetes.io/gce-pd,kubernetes.io/vsphere-volume creationTimestamp: "2024-03-20T02:16:34Z" name: ip-10-0-22-114.ec2.internal ownerReferences: - apiVersion: v1 kind: Node name: ip-10-0-22-114.ec2.internal uid: acb9a153-bb9b-4c4a-90c1-f3e095173ce2 resourceVersion: "19281" uid: 12507a73-898d-441a-a844-41c7de290b5b spec: drivers: - allocatable: count: 26 name: ebs.csi.aws.com nodeID: i-00ec014c5676a99d2 topologyKeys: - topology.ebs.csi.aws.com/zone $ export VOL_COUNT_LIMIT="26" $ export NODE_NAME="ip-10-0-22-114.ec2.internal" $ envsubst < sts-vol-limit.yaml| oc apply -f - statefulset.apps/statefulset-vol-limit created $ oc get sts NAME READY AGE statefulset-vol-limit 25/26 169m $ oc describe po/statefulset-vol-limit-25 Name: statefulset-vol-limit-25 Namespace: default Priority: 0 Service Account: default Node: ip-10-0-22-114.ec2.internal/ Start Time: Wed, 20 Mar 2024 18:56:08 +0800 Labels: app=my-svc apps.kubernetes.io/pod-index=25 controller-revision-hash=statefulset-vol-limit-7db55989f7 statefulset.kubernetes.io/pod-name=statefulset-vol-limit-25 Annotations: k8s.ovn.org/pod-networks: {"default":{"ip_addresses":[""],"mac_address":"0a:58:0a:80:02:35","gateway_ips":[""],"routes":[{"dest":" Status: Pending IP: IPs: <none> Controlled By: StatefulSet/statefulset-vol-limit Containers: openshifttest: Container ID: Image: quay.io/openshifttest/hello-openshift@sha256:56c354e7885051b6bb4263f9faa58b2c292d44790599b7dde0e49e7c466cf339 Image ID: Port: <none> Host Port: <none> State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: <none> Mounts: /mnt/storage from data (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zkwqx (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: data: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: data-statefulset-vol-limit-25 ReadOnly: false kube-api-access-zkwqx: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: BestEffort Node-Selectors: <none> Tolerations: node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 167m default-scheduler Successfully assigned default/statefulset-vol-limit-25 to ip-10-0-22-114.ec2.internal Warning FailedAttachVolume 166m (x2 over 166m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-b43ec1d0-4fa3-4e87-a80b-6ad912160273" : rpc error: code = Internal desc = Could not attach volume "vol-0a7cb8c5859cf3f96" to node "i-00ec014c5676a99d2": context deadline exceeded Warning FailedAttachVolume 30s (x87 over 166m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-b43ec1d0-4fa3-4e87-a80b-6ad912160273" : rpc error: code = Internal desc = Could not attach volume "vol-0a7cb8c5859cf3f96" to node "i-00ec014c5676a99d2": attachment of disk "vol-0a7cb8c5859cf3f96" failed, expected device to be attached but was attaching
In step4 The statefulset all replicas should all become ready.
The AWS arm instance types "c7gd.2xlarge , m7gd.xlarge" all should be "25" not "26"
