Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: 4.21.0
Affects Version/s: 4.19, 4.20
Component/s: Machine Config Operator
Labels:
- mco-triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
None
Architecture:

All

Target Backport Versions:

4.19.0, 4.20.0
Target Version:

4.21.0
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
In Progress
Release Note Type:
Release Note Not Required
Release Note Text:
N/A

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

During MCO updating a node, in certain conditions (e.g., unstable network), a failed cordon request can leave the Unschedulable state of the node instance held by MCO out of sync with the node’s actual Unschedulable state in the cluster, causing MCO to hang on that node until it eventually times out.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Create a MachineConfig that triggers a node drain and apply it.
    2. When a specific node starts updating, intercept the first cordon patch/update request and force it to fail.
    3. Observe the MCP status and wait for the operation to hang and eventually time out.

Actual results:

    MCO gets stuck on a specific node during update, repeatedly attempting cordon while the node state does not progress

Expected results:

    Even if the first cordon attempt hits a network error, retries should be recoverable, converging to the correct Unschedulable state

Additional info:

blocks

OCPBUGS-63127 MCO may hang until timeout when cordoning a node

Closed

is cloned by

OCPBUGS-63127 MCO may hang until timeout when cordoning a node

Closed

links to

openshift/machine-config-operator#5305: OCPBUGS-62341: Ensure the node passed to RunCordonOrUncordon comes from the latest updated state

Assignee:: Chi Zhang (Inactive)

Reporter:: Chi Zhang (Inactive)

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/09/29 4:32 PM

Updated:: 2026/02/10 9:54 AM

Resolved:: 2026/02/10 9:54 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates