Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.17.z, 4.18.z
Component/s: RHCOS
Labels:
- mco-triaged
- rhcos-engaged

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

    After rebooting OpenShift worker nodes, NetworkManager does not start automatically, which causes the node to remain in NotReady state. Since networking is unavailable, the node cannot pull images or rejoin the cluster.
Manual intervention is required to regain access to the node, start NetworkManager, and restore partial functionality. However, kubelet still does not start automatically even after NetworkManager is started, and the node never recovers on its own, even after waiting for an extended period.
This behavior has been observed consistently across multiple clusters, indicating a systemic issue rather than a one-off node failure.
The issue appears similar to https://issues.redhat.com/browse/OCPBUGS-36198, suggesting a possible regression or related condition involving Machine Config Operator and NetworkManager initialization logic.

Version-Release number of selected component (if applicable):

    4.17.z , 4.18.z

How reproducible:

    not reproducible in our own environment

Steps to Reproduce:

    1. only reproducible in customer environment

Actual results:

    NetworkManager does not start automatically after reboot.


Node remains stuck in NotReady.


Images are not pulled due to lack of networking.


Manual recovery steps required:


Reset core user password.


SSH into the node.


Manually start NetworkManager:
systemctl start NetworkManager





After NetworkManager starts, image pulls begin.


kubelet still does not start automatically, even after waiting for days.


Node never recovers without further manual intervention.

Expected results:

    After reboot:


NetworkManager should start automatically.


kubelet should start automatically once networking is available.


Node should transition back to Ready state without manual intervention.




Nodes should recover fully after reboot, as expected in a production OpenShift cluster.

Additional info:

    Jan 30 09:09:00 node.example.com systemd[1]:
Cleans NetworkManager state generated by dracut was skipped
because of an unmet condition check
(ConditionPathExists=/var/lib/mco/nm-clean-initrd-state).

Assignee:: Unassigned

Reporter:: Vishvranjan Mishra

Need Info From:: None

Contributors:: None

QA Contact:: Sergio Regidor de la Rosa

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2026/02/02 6:07 PM

Updated:: 2026/02/19 7:05 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates