-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
The current podAntiAffinity configuration improves HA by separating node-healthcheck-controller-manager replicas across nodes.
However, because it uses preferredDuringScheduling, there is still a chance that both replicas may be scheduled on the same node under certain conditions.
This behavior is not guaranteed and may lead to unexpected co-location, especially during node recovery or scale-up events.
To improve this, the proposed patch introduces 'topologySpreadConstraints' with 'whenUnsatisfiable: "DoNotSchedule"' and 'maxSkew: 1'.
This configuration ensures strict distribution of replicas across nodes when multiple nodes are available, while still allowing scheduling on a single node.
It helps prevent replica pods from being scheduled on the same node, which improves fault tolerance in HA deployments.
This patch introduces the following change:
[Before merging this patch]
- In single-node environments, both node-healthcheck-controller-manager replicas are scheduled on the same node.
- In multi-node environments, the scheduler tries to place the replicas on separate nodes.
- However, during node recovery scenarios, such as planned maintenance where nodes come online one by one. Both replicas may still be scheduled on the same node.
[After merging this patch]
- In single-node environments, both replicas are scheduled on the same node as expected.
- In multi-node environments, the scheduler strictly enforces distribution: each replica is placed on a different node.
- In cases where only one node is available (e.g. during recovery), the second pod will not be scheduled until another node becomes available, preventing both pods from being placed on the same node.
This change helps ensure strictly scheduling behavior and improves fault tolerance in both multi-node and single-node environments.
- causes
-
RHWA-366 Investigate update strategy with topologySpreadConstraints
-
- New
-
- is related to
-
RHWA-363 SNR: use 2 replicas
-
- Review
-
-
RHWA-308 [NHC] Improve HA by enforcing podAntiAffinity for controller-manager replicas
-
- Closed
-
- relates to
-
RHWA-364 [FAR] Improve HA by using 'topologySpreadConstraints' to enforce strict pod distribution for fence-agents-controller-manager replicas
-
- Review
-
- links to