-
Bug
-
Resolution: Done-Errata
-
Major
-
4.15
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
No
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
[AWS-EBS-CSI-Driver] allocatable volumes count incorrect in csinode for AWS arm instance types "c7gd.2xlarge , m7gd.xlarge"
Version-Release number of selected component (if applicable):
4.15.3
How reproducible:
Always
Steps to Reproduce:
1. Create an Openshift cluster on AWS with intance types "c7gd.2xlarge , m7gd.xlarge"
2. Check the csinode allocatable volumes count
3. Create statefulset with 1 pvc mounted and max allocatable volumes count replicas with nodeAffinity
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: statefulset-vol-limit
spec:
serviceName: "my-svc"
replicas: $VOL_COUNT_LIMIT
selector:
matchLabels:
app: my-svc
template:
metadata:
labels:
app: my-svc
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- $NODE_NAME
containers:
- name: openshifttest
image: quay.io/openshifttest/hello-openshift@sha256:56c354e7885051b6bb4263f9faa58b2c292d44790599b7dde0e49e7c466cf339
volumeMounts:
- name: data
mountPath: /mnt/storage
tolerations:
- key: "node-role.kubernetes.io/master"
effect: "NoSchedule"
volumeClaimTemplates:
- metadata:
name: doc gata
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: gp3-csi
resources:
requests:
storage: 1Gi
4. The statefulset all replicas should all become ready.
Actual results:
In step 4, the statefulset 26th replica(pod) stuck at ContainerCreating caused by the volume couldn't be attached to the node(the csinode allocatable volumes count incorrect)
$ oc get no/ip-10-0-22-114.ec2.internal -oyaml|grep 'instance'
beta.kubernetes.io/instance-type: m7gd.xlarge
node.kubernetes.io/instance-type: m7gd.xlarge
$ oc get csinode/ip-10-0-22-114.ec2.internal -oyaml
apiVersion: storage.k8s.io/v1
kind: CSINode
metadata:
annotations:
storage.alpha.kubernetes.io/migrated-plugins: kubernetes.io/aws-ebs,kubernetes.io/azure-disk,kubernetes.io/azure-file,kubernetes.io/cinder,kubernetes.io/gce-pd,kubernetes.io/vsphere-volume
creationTimestamp: "2024-03-20T02:16:34Z"
name: ip-10-0-22-114.ec2.internal
ownerReferences:
- apiVersion: v1
kind: Node
name: ip-10-0-22-114.ec2.internal
uid: acb9a153-bb9b-4c4a-90c1-f3e095173ce2
resourceVersion: "19281"
uid: 12507a73-898d-441a-a844-41c7de290b5b
spec:
drivers:
- allocatable:
count: 26
name: ebs.csi.aws.com
nodeID: i-00ec014c5676a99d2
topologyKeys:
- topology.ebs.csi.aws.com/zone
$ export VOL_COUNT_LIMIT="26"
$ export NODE_NAME="ip-10-0-22-114.ec2.internal"
$ envsubst < sts-vol-limit.yaml| oc apply -f -
statefulset.apps/statefulset-vol-limit created
$ oc get sts
NAME READY AGE
statefulset-vol-limit 25/26 169m
$ oc describe po/statefulset-vol-limit-25
Name: statefulset-vol-limit-25
Namespace: default
Priority: 0
Service Account: default
Node: ip-10-0-22-114.ec2.internal/10.0.22.114
Start Time: Wed, 20 Mar 2024 18:56:08 +0800
Labels: app=my-svc
apps.kubernetes.io/pod-index=25
controller-revision-hash=statefulset-vol-limit-7db55989f7
statefulset.kubernetes.io/pod-name=statefulset-vol-limit-25
Annotations: k8s.ovn.org/pod-networks:
{"default":{"ip_addresses":["10.128.2.53/23"],"mac_address":"0a:58:0a:80:02:35","gateway_ips":["10.128.2.1"],"routes":[{"dest":"10.128.0.0...
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/statefulset-vol-limit
Containers:
openshifttest:
Container ID:
Image: quay.io/openshifttest/hello-openshift@sha256:56c354e7885051b6bb4263f9faa58b2c292d44790599b7dde0e49e7c466cf339
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/mnt/storage from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zkwqx (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-statefulset-vol-limit-25
ReadOnly: false
kube-api-access-zkwqx:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 167m default-scheduler Successfully assigned default/statefulset-vol-limit-25 to ip-10-0-22-114.ec2.internal
Warning FailedAttachVolume 166m (x2 over 166m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-b43ec1d0-4fa3-4e87-a80b-6ad912160273" : rpc error: code = Internal desc = Could not attach volume "vol-0a7cb8c5859cf3f96" to node "i-00ec014c5676a99d2": context deadline exceeded
Warning FailedAttachVolume 30s (x87 over 166m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-b43ec1d0-4fa3-4e87-a80b-6ad912160273" : rpc error: code = Internal desc = Could not attach volume "vol-0a7cb8c5859cf3f96" to node "i-00ec014c5676a99d2": attachment of disk "vol-0a7cb8c5859cf3f96" failed, expected device to be attached but was attaching
Expected results:
In step4 The statefulset all replicas should all become ready.
Additional info:
The AWS arm instance types "c7gd.2xlarge , m7gd.xlarge" all should be "25" not "26"
- links to
-
RHEA-2024:0041
OpenShift Container Platform 4.16.z bug fix update