-
Feature
-
Resolution: Done
-
Normal
-
None
-
None
The node-healthcheck-controller-manager replicas may be scheduled on the same node, which reduces the effectiveness of high availability.
This issue can occur when nodes are being brought back online one by one, such as during maintenance.
In such cases, both replicas may be placed on a single node, creating a single point of failure.
This patch introduces the following change:
[Before merging this patch]
The default scheduler may place 2 replicas on the same node, especially when fewer nodes are available.
[After merging this patch]
The fence-agents-controller-manager deployment includes a podAntiAffinity rule using requiredDuringSchedulingIgnoredDuringExecution, ensuring that replicas are scheduled on separate nodes.
Scheduling both replicas on the same node introduces a single point of failure and should be avoided in HA configurations.
In situations where only one node is available, such as during planned maintenance or recovery, this can lead to delays or missed remediation.
To help improve fault tolerance, it is recommended to use podAntiAffinity rules so that each replica runs on a different node.
By preventing both replicas from running on the same node, this setup enhances the resilience of the remediation process.