-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.17, 4.16.z
-
Important
-
None
-
Rejected
-
False
-
-
Release Note Not Required
-
In Progress
Description of problem:
[AWS-EBS-CSI-Driver] allocatable volumes count incorrect in csinode for AWS vt1*/g4* instance types
Version-Release number of selected component (if applicable):
4.17.0-0.nightly-2024-07-16-033047
How reproducible:
Always
Steps to Reproduce:
1. Use instance type "vt1.3xlarge"/"g4ad.xlarge"/"g4dn.xlarge" install Openshift cluster on AWS 2. Check the csinode allocatable volumes count $ oc get csinode ip-10-0-53-225.ec2.internal -ojsonpath='{.spec.drivers[?(@.name=="ebs.csi.aws.com")].allocatable.count}' 26 g4ad.xlarge # 25 g4dn.xlarge # 25 vt1.3xlarge # 26 $ oc get no/ip-10-0-53-225.ec2.internal -oyaml| grep 'instance-type' beta.kubernetes.io/instance-type: vt1.3xlarge node.kubernetes.io/instance-type: vt1.3xlarge 3. Create statefulset with pvc(which use the ebs csi storageclass), nodeAnffinity to the same node and set the replicas to the max volumesallocatable count to verify the the csinode allocatable volumes count is correct and all the pods should become Running # Test data apiVersion: apps/v1 kind: StatefulSet metadata: name: statefulset-vol-limit spec: serviceName: "my-svc" replicas: 26 selector: matchLabels: app: my-svc template: metadata: labels: app: my-svc spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - ip-10-0-53-225.ec2.internal # Make all volume attach to the same node containers: - name: openshifttest image: quay.io/openshifttest/hello-openshift@sha256:56c354e7885051b6bb4263f9faa58b2c292d44790599b7dde0e49e7c466cf339 volumeMounts: - name: data mountPath: /mnt/storage tolerations: - key: "node-role.kubernetes.io/master" effect: "NoSchedule" volumeClaimTemplates: - metadata: name: data spec: accessModes: [ "ReadWriteOnce" ] #storageClassName: gp3-csi resources: requests: storage: 1Gi
Actual results:
In step 3 there's some pods stuck at "ContainerCreating" status caused by volumes stuck at attaching status and couldn't be attached to the node
Expected results:
In step 3 all the pods with pvc should become "Running", and In step 2 the csinode allocatable volumes count should be correct -> g4ad.xlarge allocatable count should be 24 -> g4dn.xlarge allocatable count should be 24 -> vt1.3xlarge allocatable count should be 24
Additional info:
... attach or mount volumes: unmounted volumes=[data12 data6], unattached volumes=[data12 data6], failed to process volumes=[]: timed out waiting for the condition 06-25 17:51:23.680 Warning FailedAttachVolume 4m1s (x13 over 14m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-d08d4133-f589-4aa3-bbef-f988058c419a" : rpc error: code = Internal desc = Could not attach volume "vol-0aa138f453d414ec3" to node "i-09d532f5155b3c05d": attachment of disk "vol-0aa138f453d414ec3" failed, expected device to be attached but was attaching 06-25 17:51:23.681 Warning FailedMount 3m40s (x3 over 10m) kubelet Unable to attach or mount volumes: unmounted volumes=[data6 data12], unattached volumes=[data12 data6], failed to process volumes=[]: timed out waiting for the condition ...
- links to
-
RHEA-2024:6122 OpenShift Container Platform 4.18.z bug fix update