Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Minor
Fix Version/s: None
Affects Version/s: 4.20.0
Component/s: Machine Config Operator
Labels:
- mco-triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
2
Severity:
Low
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
MCO Sprint 276
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:


When a MCD is restarted it executes the rpm-ostree cleanup -r command. If the node is shut down while this command is executed the pool is degraded with this error

  - lastTransitionTime: "2025-06-25T13:15:58Z"
    message: 'Node ip-10-0-83-147.us-east-2.compute.internal is reporting: "failed
      to remove rollback: error running rpm-ostree cleanup -r: : signal: terminated"'
    reason: 1 nodes are reporting degraded status on sync
    status: "True"
    type: NodeDegraded

Version-Release number of selected component (if applicable):

4.19.0-0.nightly-2025-06-24-180820

How reproducible:

Always

Steps to Reproduce:

1. Scale up a node

2. When the node just joins watch the log until we see this line

I0625 13:15:48.408802    2666 daemon.go:1620] Previous boot ostree-finalize-staged.service appears successful
I0625 13:15:48.408850    2666 daemon.go:1755] Current+desired config: rendered-worker-ac10cd07b163bf86670395119e135489
I0625 13:15:48.408870    2666 daemon.go:1770] state: Done
I0625 13:15:48.408902    2666 update.go:2741] Running: rpm-ostree cleanup -r


3. Shutdown the node with "shutdown now" while this command is running

Actual results:

The worker MCP is permanently degraded with the error mentioned above

We can see this log in the MCD


I0625 13:15:47.473923    2666 image_manager_helper.go:92] Running captured: systemctl list-units --state=failed --no-legend
I0625 13:15:47.481529    2666 daemon.go:1827] systemd service state: OK
I0625 13:15:47.481571    2666 daemon.go:1380] Starting MachineConfigDaemon
I0625 13:15:47.481626    2666 daemon.go:1387] Enabling Kubelet Healthz Monitor
I0625 13:15:47.521512    2666 daemon.go:2979] Found 0 requested local packages in the booted deployment
I0625 13:15:48.357928    2666 daemon.go:670] Node ip-10-0-83-147.us-east-2.compute.internal is not labeled node-role.kubernetes.io/master
I0625 13:15:48.360434    2666 daemon.go:2032] Running: /run/machine-config-daemon-bin/nmstatectl persist-nic-names --root / --kargs-out /tmp/nmstate-kargs229921083 --cleanup
[2025-06-25T13:15:48Z INFO  nmstatectl] Nmstate version: 2.2.45
[2025-06-25T13:15:48Z INFO  nmstatectl::persist_nic] /etc/systemd/network does not exist, no need to clean up

I0625 13:15:48.389643    2666 node.go:23] No machineconfiguration.openshift.io/currentConfig annotation on node ip-10-0-83-147.us-east-2.compute.internal: map[cloud.network.openshift.io/egress-ipconfig:[{"interface":"eni-0464701685a8a3c8f","ifaddr":{"ipv4":"10.0.64.0/19"},"capacity":{"ipv4":14,"ipv6":15}}] k8s.ovn.org/node-gateway-router-lrp-ifaddrs:{"default":{"ipv4":"100.64.0.16/16"}} k8s.ovn.org/node-id:16 k8s.ovn.org/node-subnets:{"default":["10.130.6.0/23"]} k8s.ovn.org/node-transit-switch-port-ifaddr:{"ipv4":"100.88.0.16/16"} machine.openshift.io/machine:openshift-machine-api/sregidor-v19-cpjc4-worker-us-east-2c-xshr7 machineconfiguration.openshift.io/controlPlaneTopology:HighlyAvailable volumes.kubernetes.io/controller-managed-attach-detach:true], in cluster bootstrap, loading initial node annotation from /etc/machine-config-daemon/node-annotations.json
I0625 13:15:48.390405    2666 node.go:52] Setting initial node config: rendered-worker-ac10cd07b163bf86670395119e135489
I0625 13:15:48.402032    2666 daemon.go:1682] In bootstrap mode
I0625 13:15:48.408802    2666 daemon.go:1620] Previous boot ostree-finalize-staged.service appears successful
I0625 13:15:48.408850    2666 daemon.go:1755] Current+desired config: rendered-worker-ac10cd07b163bf86670395119e135489
I0625 13:15:48.408870    2666 daemon.go:1770] state: Done
I0625 13:15:48.408902    2666 update.go:2741] Running: rpm-ostree cleanup -r
Bootloader updated; bootconfig swap: yes; bootversion: boot.1.1, deployment count change: -1
E0625 13:15:58.244754    2666 writer.go:231] Marking Degraded due to: "failed to remove rollback: error running rpm-ostree cleanup -r: : signal: terminated"
I0625 13:15:58.282753    2666 daemon.go:759] Transitioned from state: Done -> Degraded
I0625 13:15:58.282772    2666 daemon.go:762] Transitioned from degraded/unreconcilable reason  -> failed to remove rollback: error running rpm-ostree cleanup -r: : signal: terminated
I0625 13:15:58.286390    2666 daemon.go:2032] Running: /run/machine-config-daemon-bin/nmstatectl persist-nic-names --root / --kargs-out /tmp/nmstate-kargs4193065622 --cleanup

Expected results:

No degradation should happen

Additional info:


I haven't tried with other rpm-ostree commands. Maybe we get a degraded status when the node is shutdown while MCO is running any rpm-ostree command, but I'm not sure, I've only checked the cleanup command.

Slack conversation:
https://redhat-internal.slack.com/archives/GH7G2MANS/p1750857107077139

Assignee:: Isabella Janssen

Reporter:: Sergio Regidor de la Rosa

Need Info From:: None

Contributors:: None

QA Contact:: Sergio Regidor de la Rosa

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/06/25 1:32 PM

Updated:: 2025/09/04 5:01 PM

Resolved:: 2025/09/03 8:24 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates