Loading...

XML

Word

Printable

Type: Feature
Resolution: Done
Priority: Normal
Fix Version/s: rhwa-25.8
Affects Version/s: None
Component/s: Node Healthcheck
Labels:
None

Blocked:
False
Ready:
False

Target Version:

rhwa-25.8

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Intelligence Requested:
Market:

The node-healthcheck-controller-manager replicas may be scheduled on the same node, which reduces the effectiveness of high availability.

This issue can occur when nodes are being brought back online one by one, such as during maintenance.
In such cases, both replicas may be placed on a single node, creating a single point of failure.

This patch introduces the following change:

[Before merging this patch]
The default scheduler may place 2 replicas on the same node, especially when fewer nodes are available.

[After merging this patch]
The fence-agents-controller-manager deployment includes a podAntiAffinity rule using requiredDuringSchedulingIgnoredDuringExecution, ensuring that replicas are scheduled on separate nodes.

Scheduling both replicas on the same node introduces a single point of failure and should be avoided in HA configurations.
In situations where only one node is available, such as during planned maintenance or recovery, this can lead to delays or missed remediation.
To help improve fault tolerance, it is recommended to use podAntiAffinity rules so that each replica runs on a different node.
By preventing both replicas from running on the same node, this setup enhances the resilience of the remediation process.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

nhc_pod_aff.text
14 kB
2025/09/12 6:03 PM
nhc_pod_aff_4_18_disconnected_12_sept.text
51 kB
2025/09/11 9:59 PM
nhc_far_15_sept_pod_aff_delay_max_health.text
125 kB
2025/09/15 3:59 PM

clones

RHWA-283 [FAR] Improve HA by enforcing podAntiAffinity for controller-manager replicas

Closed

relates to

RHWA-365 [NHC] Improve HA by using 'topologySpreadConstraints' to enforce strict pod distribution for node-healthcheck-controller-manager replicas

Review

Assignee:: Or Raz

Reporter:: KATSUYA KAWAKAMI

Contributors:: Marc Sluiter

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/09/11 1:53 PM

Updated:: 2025/10/11 7:39 AM

Resolved:: 2025/09/15 3:59 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty

Hide