-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.14.z, 4.15.0
-
No
-
Hypershift Sprint 246, Hypershift Sprint 247
-
2
-
False
-
-
Release Note Not Required
-
In Progress
Description of problem:
In the self-managed HCP use case, if the on-premise baremetal management cluster does not have nodes labeled with the "topology.kubernetes.io/zone" key, then all HCP pods for a High Available cluster are scheduled to a single mgmt cluster node. This is a result of the way the affinity rules are constructed. Take the pod affinity/antiAffinity example below, which is generated for a HA HCP cluster. If the "topology.kubernetes.io/zone" label does not exist on the mgmt cluster nodes, then the pod will still get scheduled but that antiAffinity rule is effectively ignored. That seems odd due to the usage of the "requiredDuringSchedulingIgnoredDuringExecution" value, but I have tested this and the rule truly is ignored if the topologyKey is not present.
podAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchLabels: hypershift.openshift.io/hosted-control-plane: clusters-vossel1 topologyKey: kubernetes.io/hostname weight: 100 podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchLabels: app: kube-apiserver hypershift.openshift.io/control-plane-component: kube-apiserver topologyKey: topology.kubernetes.io/zone
In the event that no "zones" are configured for the baremetal mgmt cluster, then the only other pod affinity rule is one that actually colocates the pods together. This results in a HA HCP having all the etcd, apiservers, etc... scheduled to a single node.
Version-Release number of selected component (if applicable):
4.14
How reproducible:
100%
Steps to Reproduce:
1. Create a self-managed HA HCP cluster on a mgmt cluster with nodes that lack the "topology.kubernetes.io/zone" label
Actual results:
all HCP pods are scheduled to a single node.
Expected results:
HCP pods should always be spread across multiple nodes.
Additional info:
A way to address this is to add another anti-affinity rule which prevents every component from being scheduled on the same node as its replicas
- blocks
-
OCPBUGS-28764 Self-managed HCP pods are scheduled on single mgmt cluster node when no zones are in use
- Closed
- is cloned by
-
OCPBUGS-28764 Self-managed HCP pods are scheduled on single mgmt cluster node when no zones are in use
- Closed
-
ACM-11454 Self-managed HCP pods are scheduled on single mgmt cluster node when no zones are in use
- Closed
- links to
-
RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update