-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.18
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Deploying a workload that deletes and recreates pods on +100 nodes while scaling it up ~6000 pods (between server, client and dpdk pods) in total. About 55% of these pods consume sriov/VFs, which are a limited resource on the nodes (64/node). When pods are first deployed the distribution is already skewed (see below), when recreating the pods (churn), the distribution becomes worse (see below), causing pods to end up pending with an VF exhaustion error (see below). I have tried node/pod affinities and TSC to meddle with the distribution, and the only configuration that does seem to avoid this scenario is when server pods (can be scheduled anywhere) are not scheduled on the same nodes as dpdk pods (only on worker-dpdk nodes). Both pod types request VFs. pod description (error): Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 7m4s default-scheduler 0/118 nodes are available: 1 Insufficient hugepages-1Gi, 26 Insufficient openshift.io/intelnics2, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 4 node(s) were unschedulable, 85 node(s) didn't match Pod's node affinity/selector. preemption: 0/118 nodes are available: 26 No preemption victims found for incoming pod, 92 Preemption is not helpful for scheduling. Warning FailedScheduling 54s (x2 over 6m2s) default-scheduler 0/118 nodes are available: 1 Insufficient hugepages-1Gi, 26 Insufficient openshift.io/intelnics2, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 4 node(s) were unschedulable, 85 node(s) didn't match Pod's node affinity/selector. preemption: 0/118 nodes are available: 26 No preemption victims found for incoming pod, 92 Preemption is not helpful for scheduling. node events: 7m37s Warning FailedScheduling pod/dpdk-1-57fdbfbb54-ddz8t 0/118 nodes are available: 1 Insufficient hugepages-1Gi, 26 Insufficient openshift.io/intelnics2, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 4 node(s) were unschedulable, 85 node(s) didn't match Pod's node affinity/selector. preemption: 0/118 nodes are available: 26 No preemption victims found for incoming pod, 92 Preemption is not helpful for scheduling. 87s Warning FailedScheduling pod/dpdk-1-57fdbfbb54-ddz8t 0/118 nodes are available: 1 Insufficient hugepages-1Gi, 26 Insufficient openshift.io/intelnics2, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 4 node(s) were unschedulable, 85 node(s) didn't match Pod's node affinity/selector. preemption: 0/118 nodes are available: 26 No preemption victims found for incoming pod, 92 Preemption is not helpful for scheduling. Pod distribution/spread (server) - BEFORE churn, first deployment: 48 e23-h26-b03-fc640 customcnf",worker" 47 e23-h26-b02-fc640 customcnf",worker" 41 e17-h24-b01-fc640 worker",worker-metallb" 41 e17-h18-b04-fc640 worker",worker-metallb" 40 e17-h20-b04-fc640 worker",worker-metallb" 39 e18-h12-b02-fc640 worker",worker-metallb" 39 e17-h20-b01-fc640 worker",worker-metallb" 38 e17-h24-b02-fc640 worker",worker-metallb" 38 e17-h20-b03-fc640 worker",worker-metallb" 37 e23-h26-b01-fc640 customcnf",worker" 37 e18-h12-b03-fc640 worker",worker-metallb" 37 e17-h20-b02-fc640 worker",worker-metallb" 35 e17-h24-b04-fc640 worker",worker-metallb" 34 e18-h14-b01-fc640 customcnf",worker" 33 e20-h18-b01-fc640 customcnf",worker" 33 e19-h26-b02-fc640 customcnf",worker" 33 e19-h18-b02-fc640 customcnf",worker" 33 e18-h20-b02-fc640 customcnf",worker" 33 e18-h18-b02-fc640 customcnf",worker" 33 e18-h14-b04-fc640 customcnf",worker" 32 e23-h14-b03-fc640 customcnf",worker" 32 e19-h24-b02-fc640 customcnf",worker" 32 e19-h18-b04-fc640 customcnf",worker" 32 e19-h18-b03-fc640 customcnf",worker" 32 e18-h24-b03-fc640 customcnf",worker" 32 e18-h18-b04-fc640 customcnf",worker" 32 e18-h18-b03-fc640 customcnf",worker" 32 e17-h18-b03-fc640 customcnf",worker" 31 e23-h20-b02-fc640 customcnf",worker" 31 e23-h18-b03-fc640 customcnf",worker" 31 e22-h18-b03-fc640 customcnf",worker" 31 e22-h18-b01-fc640 customcnf",worker" 31 e20-h26-b01-fc640 customcnf",worker" 31 e20-h24-b01-fc640 customcnf",worker" 31 e20-h14-b01-fc640 customcnf",worker" 31 e20-h12-b01-fc640 customcnf",worker" 31 e19-h20-b02-fc640 customcnf",worker" 31 e18-h20-b04-fc640 customcnf",worker" 30 e23-h24-b04-fc640 customcnf",worker" 30 e23-h24-b03-fc640 customcnf",worker" 30 e23-h24-b02-fc640 customcnf",worker" 30 e23-h20-b04-fc640 customcnf",worker" 30 e23-h20-b01-fc640 customcnf",worker" 30 e23-h18-b02-fc640 customcnf",worker" 30 e23-h12-b04-fc640 customcnf",worker" 30 e21-h20-b03-fc640 customcnf",worker" 30 e20-h20-b03-fc640 customcnf",worker" 30 e20-h14-b02-fc640 customcnf",worker" 30 e20-h12-b04-fc640 customcnf",worker" 30 e19-h24-b04-fc640 customcnf",worker" 30 e18-h24-b01-fc640 customcnf",worker" 30 e18-h20-b01-fc640 customcnf",worker" 30 e16-h26-b04-fc640 customcnf",worker" 29 e23-h18-b01-fc640 customcnf",worker" 29 e23-h12-b03-fc640 customcnf",worker" 29 e20-h24-b04-fc640 customcnf",worker" 29 e20-h24-b02-fc640 customcnf",worker" 29 e20-h20-b04-fc640 customcnf",worker" 29 e20-h20-b02-fc640 customcnf",worker" 29 e20-h20-b01-fc640 customcnf",worker" 29 e20-h18-b02-fc640 customcnf",worker" 29 e20-h12-b03-fc640 customcnf",worker" 29 e19-h20-b04-fc640 customcnf",worker" 29 e19-h18-b01-fc640 customcnf",worker" 29 e18-h14-b03-fc640 customcnf",worker" 29 e18-h12-b04-fc640 customcnf",worker" 28 e23-h14-b04-fc640 customcnf",worker" 28 e20-h26-b04-fc640 customcnf",worker" 28 e20-h24-b03-fc640 customcnf",worker" 28 e20-h14-b03-fc640 customcnf",worker" 28 e19-h24-b01-fc640 customcnf",worker" 27 e19-h26-b04-fc640 customcnf",worker" 27 e19-h26-b03-fc640 customcnf",worker" 26 e22-h18-b02-fc640 customcnf",worker" 26 e21-h24-b02-fc640 customcnf",worker" 26 e18-h20-b03-fc640 customcnf",worker" 26 e18-h18-b01-fc640 customcnf",worker" 25 e19-h20-b01-fc640 customcnf",worker" 24 e23-h20-b03-fc640 customcnf",worker" 24 e19-h20-b03-fc640 customcnf",worker" 23 e20-h12-b02-fc640 customcnf",worker" 23 e17-h24-b03-fc640 worker",worker-metallb" 22 e19-h26-b01-fc640 customcnf",worker" 20 e20-h14-b04-fc640 customcnf",worker" 20 e19-h24-b03-fc640 customcnf",worker" 20 e17-h12-b04-fc640 worker",worker-dpdk" 17 e17-h14-b04-fc640 worker",worker-dpdk" 17 e16-h26-b02-fc640 worker",worker-dpdk" 16 e17-h18-b02-fc640 worker",worker-dpdk" 16 e17-h14-b01-fc640 worker",worker-dpdk" 16 e16-h14-b02-fc640 worker",worker-dpdk" 15 e17-h14-b03-fc640 worker",worker-dpdk" 15 e16-h20-b01-fc640 worker",worker-dpdk" 15 e16-h18-b02-fc640 worker",worker-dpdk" 14 e16-h18-b01-fc640 worker",worker-dpdk" 13 e17-h12-b03-fc640 worker",worker-dpdk" 13 e16-h24-b02-fc640 worker",worker-dpdk" 13 e16-h18-b03-fc640 worker",worker-dpdk" 12 e17-h12-b02-fc640 worker",worker-dpdk" 12 e16-h26-b01-fc640 worker",worker-dpdk" 11 e17-h12-b01-fc640 worker",worker-dpdk" 11 e16-h26-b03-fc640 worker",worker-dpdk" 11 e16-h24-b03-fc640 worker",worker-dpdk" 11 e16-h18-b04-fc640 worker",worker-dpdk" 10 e16-h14-b04-fc640 worker",worker-dpdk" 10 e16-h14-b01-fc640 worker",worker-dpdk" 9 e17-h14-b02-fc640 worker",worker-dpdk" 8 e17-h18-b01-fc640 worker",worker-dpdk" 6 e16-h24-b01-fc640 worker",worker-dpdk" 5 e16-h24-b04-fc640 worker",worker-dpdk" 3 e16-h20-b03-fc640 worker",worker-dpdk" Pod spread (server), AFTER/DURING churn, recreating pods: 64 e23-h26-b03-fc640 customcnf",worker" 64 e23-h26-b02-fc640 customcnf",worker" 64 e18-h12-b03-fc640 worker",worker-metallb" 64 e18-h12-b02-fc640 worker",worker-metallb" 64 e17-h24-b04-fc640 worker",worker-metallb" 64 e17-h24-b03-fc640 worker",worker-metallb" 64 e17-h24-b02-fc640 worker",worker-metallb" 64 e17-h24-b01-fc640 worker",worker-metallb" 64 e17-h20-b04-fc640 worker",worker-metallb" 64 e17-h20-b03-fc640 worker",worker-metallb" 64 e17-h20-b02-fc640 worker",worker-metallb" 64 e17-h20-b01-fc640 worker",worker-metallb" 64 e17-h18-b04-fc640 worker",worker-metallb" 56 e17-h12-b04-fc640 worker",worker-dpdk" 54 e17-h14-b03-fc640 worker",worker-dpdk" 54 e17-h14-b02-fc640 worker",worker-dpdk" 54 e17-h14-b01-fc640 worker",worker-dpdk" 54 e17-h12-b03-fc640 worker",worker-dpdk" 54 e17-h12-b01-fc640 worker",worker-dpdk" 54 e16-h24-b02-fc640 worker",worker-dpdk" 54 e16-h24-b01-fc640 worker",worker-dpdk" 54 e16-h14-b01-fc640 worker",worker-dpdk" 52 e17-h18-b02-fc640 worker",worker-dpdk" 52 e17-h18-b01-fc640 worker",worker-dpdk" 52 e17-h12-b02-fc640 worker",worker-dpdk" 52 e16-h26-b02-fc640 worker",worker-dpdk" 52 e16-h24-b03-fc640 worker",worker-dpdk" 52 e16-h20-b01-fc640 worker",worker-dpdk" 52 e16-h18-b02-fc640 worker",worker-dpdk" 52 e16-h14-b04-fc640 worker",worker-dpdk" 52 e16-h14-b02-fc640 worker",worker-dpdk" 50 e17-h14-b04-fc640 worker",worker-dpdk" 50 e16-h26-b03-fc640 worker",worker-dpdk" 50 e16-h24-b04-fc640 worker",worker-dpdk" 50 e16-h18-b03-fc640 worker",worker-dpdk" 50 e16-h18-b01-fc640 worker",worker-dpdk" 48 e16-h20-b03-fc640 worker",worker-dpdk" 46 e16-h26-b01-fc640 worker",worker-dpdk" 46 e16-h18-b04-fc640 worker",worker-dpdk" 17 e23-h14-b03-fc640 customcnf",worker" 17 e19-h18-b02-fc640 customcnf",worker" 15 e23-h24-b03-fc640 customcnf",worker" 15 e18-h24-b01-fc640 customcnf",worker" 15 e18-h12-b04-fc640 customcnf",worker" 14 e23-h20-b04-fc640 customcnf",worker" 14 e22-h18-b03-fc640 customcnf",worker" 14 e22-h18-b02-fc640 customcnf",worker" 14 e21-h24-b02-fc640 customcnf",worker" 14 e20-h26-b04-fc640 customcnf",worker" 14 e20-h20-b04-fc640 customcnf",worker" 14 e20-h18-b02-fc640 customcnf",worker" 14 e19-h26-b02-fc640 customcnf",worker" 13 e23-h26-b01-fc640 customcnf",worker" 13 e23-h18-b02-fc640 customcnf",worker" 13 e23-h14-b04-fc640 customcnf",worker" 13 e23-h12-b03-fc640 customcnf",worker" 13 e20-h24-b01-fc640 customcnf",worker" 13 e20-h12-b04-fc640 customcnf",worker" 13 e20-h12-b01-fc640 customcnf",worker" 13 e19-h26-b03-fc640 customcnf",worker" 13 e19-h24-b04-fc640 customcnf",worker" 13 e19-h20-b04-fc640 customcnf",worker" 13 e19-h20-b02-fc640 customcnf",worker" 13 e19-h20-b01-fc640 customcnf",worker" 13 e19-h18-b03-fc640 customcnf",worker" 13 e18-h24-b03-fc640 customcnf",worker" 13 e18-h20-b01-fc640 customcnf",worker" 13 e18-h18-b02-fc640 customcnf",worker" 13 e18-h14-b04-fc640 customcnf",worker" 13 e18-h14-b01-fc640 customcnf",worker" 13 e17-h18-b03-fc640 customcnf",worker" 12 e23-h24-b02-fc640 customcnf",worker" 12 e23-h18-b03-fc640 customcnf",worker" 12 e23-h18-b01-fc640 customcnf",worker" 12 e23-h12-b04-fc640 customcnf",worker" 12 e21-h20-b03-fc640 customcnf",worker" 12 e20-h26-b01-fc640 customcnf",worker" 12 e20-h20-b03-fc640 customcnf",worker" 12 e20-h20-b02-fc640 customcnf",worker" 12 e20-h20-b01-fc640 customcnf",worker" 12 e20-h14-b01-fc640 customcnf",worker" 12 e20-h12-b02-fc640 customcnf",worker" 12 e19-h18-b04-fc640 customcnf",worker" 12 e19-h18-b01-fc640 customcnf",worker" 12 e18-h18-b04-fc640 customcnf",worker" 12 e16-h26-b04-fc640 customcnf",worker" 11 e23-h24-b04-fc640 customcnf",worker" 11 e23-h20-b03-fc640 customcnf",worker" 11 e22-h18-b01-fc640 customcnf",worker" 11 e20-h24-b04-fc640 customcnf",worker" 11 e20-h18-b01-fc640 customcnf",worker" 11 e20-h14-b04-fc640 customcnf",worker" 11 e20-h14-b03-fc640 customcnf",worker" 11 e20-h12-b03-fc640 customcnf",worker" 11 e19-h24-b01-fc640 customcnf",worker" 11 e18-h20-b04-fc640 customcnf",worker" 11 e18-h20-b03-fc640 customcnf",worker" 11 e18-h20-b02-fc640 customcnf",worker" 11 e18-h14-b03-fc640 customcnf",worker" 10 e23-h20-b01-fc640 customcnf",worker" 10 e19-h26-b04-fc640 customcnf",worker" 10 e18-h18-b03-fc640 customcnf",worker" 9 e23-h20-b02-fc640 customcnf",worker" 9 e20-h14-b02-fc640 customcnf",worker" 8 e19-h24-b02-fc640 customcnf",worker" 7 e19-h26-b01-fc640 customcnf",worker" 6 e20-h24-b03-fc640 customcnf",worker" 6 e20-h24-b02-fc640 customcnf",worker" 6 e19-h20-b03-fc640 customcnf",worker" 5 e19-h24-b03-fc640 customcnf",worker" 5 e18-h18-b01-fc640 customcnf",worker"
Version-Release number of selected component (if applicable):
4.18
How reproducible:
Always with this workload at the scale that at least 50% of the namespaces are churned (100% of the namespaces mean 115). PS: To rule out SRIOV as one of the sources of problem, I have tried to reset SRIOV VFs and restarted the config-daemon to verify that the VFs corresponding to deleted namespaces are returned back to the system.
Steps to Reproduce:
1. Run the workload to deploy the 115 namespaces with +6000 pods in total (20 server/ 30 client/ 2 dpdk pods per namespace). 2. Wait for churn (delete and recreate) at least 50% of the namespaces (57) 3. Wait to observe dpdk pods pending, and only about 70 namespaces recreated. With a consistent bad distribution of server pods clustering up on the worker-metallb and worker-dpdk nodes (see above).
Actual results:
Many pods clustering up in a few nodes instead of using the capacity across the cluster, exhausting resources.
Expected results:
Workload should be able to deploy, if the distribution/spread was more even, since, in total, there +7000 VFs available (all worker nodes have 64), and the scale of the workload does not reach this number.
Additional info:
workload deployed is kube-burner-ocp rds-core Server spec pod trying to give preference for balance and "worker" nodes only: spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: #block 'infra' and 'workload' labeled nodes nodeSelectorTerms: - matchExpressions: - key: node-role.kubernetes.io/infra operator: DoesNotExist - key: node-role.kubernetes.io/workload operator: DoesNotExist - key: node-role.kubernetes.io/worker operator: Exists preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/worker operator: Exists preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchLabels: app: nginx topologyKey: kubernetes.io/hostname topologySpreadConstraints: - maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway labelSelector: matchLabels: app: nginx
- mentioned in
-
Page Loading...