Loading...

XML

Word

Printable

Type: Story
Resolution: Done
Priority: Normal
Fix Version/s: rhwa-4.21-0
Affects Version/s: None
Component/s: Node Healthcheck
Labels:
- No-Doc-Update

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False

Target Version:

rhwa-4.21-0

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

The current podAntiAffinity configuration improves HA by separating node-healthcheck-controller-manager replicas across nodes.
However, because it uses preferredDuringScheduling, there is still a chance that both replicas may be scheduled on the same node under certain conditions.
This behavior is not guaranteed and may lead to unexpected co-location, especially during node recovery or scale-up events.

To improve this, the proposed patch introduces 'topologySpreadConstraints' with 'whenUnsatisfiable: "DoNotSchedule"' and 'maxSkew: 1'.
This configuration ensures strict distribution of replicas across nodes when multiple nodes are available, while still allowing scheduling on a single node.
It helps prevent replica pods from being scheduled on the same node, which improves fault tolerance in HA deployments.

This patch introduces the following change:

[Before merging this patch]

In single-node environments, both node-healthcheck-controller-manager replicas are scheduled on the same node.
In multi-node environments, the scheduler tries to place the replicas on separate nodes.
- However, during node recovery scenarios, such as planned maintenance where nodes come online one by one. Both replicas may still be scheduled on the same node.

[After merging this patch]

In single-node environments, both replicas are scheduled on the same node as expected.
In multi-node environments, the scheduler strictly enforces distribution: each replica is placed on a different node.
- In cases where only one node is available (e.g. during recovery), the second pod will not be scheduled until another node becomes available, preventing both pods from being placed on the same node.

This change helps ensure strictly scheduling behavior and improves fault tolerance in both multi-node and single-node environments.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

RHWA-365-connected-4.21-topologySpreadConstraints-node-healthcheck-controller-manager.text
2026/02/13 11:12 AM
12 kB
vipin kumar

causes

RHWA-366 Investigate update strategy with topologySpreadConstraints

Closed

is related to

RHWA-363 SNR: use 2 replicas

Closed

RHWA-308 [NHC] Improve HA by enforcing podAntiAffinity for controller-manager replicas

Closed

relates to

RHWA-364 [FAR] Improve HA by using 'topologySpreadConstraints' to enforce strict pod distribution for fence-agents-controller-manager replicas

Closed

links to

medik8s/node-healthcheck-operator#383: Use strict pod placement for controller-manager

mentioned on

Merge request - TELCODOCS-2597: RHWA 4.21-0 Release Notes first draft / Common Attributes...

(1 mentioned on)

Assignee:: Marc Sluiter

Reporter:: KATSUYA KAWAKAMI

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/10/02 6:04 PM

Updated:: 2026/02/24 12:56 PM

Resolved:: 2026/01/12 4:05 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty