-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
4.17.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
Customer Escalated
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
In a management cluster with 55 Hosted Control Planes (HCPs) and (18 nodes, 3 master+15 worker nodes), 352 HCP pods from 27 different hosted clusters (49% of all HCPs) were scheduled onto a single worker node (we08d23). Cluster is experiencing severe latency issues, with upto 20+ second delays for basic system operations like `busctl` calls to systemd-hostnamed. Some nodes are experiencing high workload pressure from these HCP workloads that are observed to be unevenly distributes among the 15 worker nodes. 19 kube-apiserver pods were scheduled to that worker.
$ cat sos_commands/crio/crictl_pods | grep "clusters-" | awk '{print $6}' | grep kube-apiserver | wc -l 19
$ cat sos_commands/crio/crictl_pods | grep "clusters-" | wc -l
352
$ cat sos_commands/crio/crictl_pods | grep "clusters-" | awk '{print $7}' | cut -d- -f1-2 | sort -u | wc -l 27
Environment:
OpenShift Version: 4.17
Worker Nodes: 15 total (256 CPU)
- Reserved CPUs for housekeeping: 32 (CPUs 0-31) - via Performance Profile
- Isolated CPUs: 224 (CPUs 32-255)
Hosted Control Planes: ~55
Steps to Reproduce:
1. 50+ Hosted Control Planes with the cluster spec above.
2. Worker nodes with CPU isolation (limited allocatable CPUs)
Actual results:
19 kube-apiservers in one node
Expected results:
55 apiservers ÷ 15 nodes * 3(HA) = 11 per node for kube-apiservers
Additional info:
All 19 apiservers are from different hosted clusters, meaning the affinity is working correctly.