-
Bug
-
Resolution: Not a Bug
-
Minor
-
None
-
4.20.0
-
Quality / Stability / Reliability
-
False
-
-
2
-
Low
-
None
-
None
-
None
-
None
-
MCO Sprint 276
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
When a MCD is restarted it executes the rpm-ostree cleanup -r command. If the node is shut down while this command is executed the pool is degraded with this error - lastTransitionTime: "2025-06-25T13:15:58Z" message: 'Node ip-10-0-83-147.us-east-2.compute.internal is reporting: "failed to remove rollback: error running rpm-ostree cleanup -r: : signal: terminated"' reason: 1 nodes are reporting degraded status on sync status: "True" type: NodeDegraded
Version-Release number of selected component (if applicable):
4.19.0-0.nightly-2025-06-24-180820
How reproducible:
Always
Steps to Reproduce:
1. Scale up a node 2. When the node just joins watch the log until we see this line I0625 13:15:48.408802 2666 daemon.go:1620] Previous boot ostree-finalize-staged.service appears successful I0625 13:15:48.408850 2666 daemon.go:1755] Current+desired config: rendered-worker-ac10cd07b163bf86670395119e135489 I0625 13:15:48.408870 2666 daemon.go:1770] state: Done I0625 13:15:48.408902 2666 update.go:2741] Running: rpm-ostree cleanup -r 3. Shutdown the node with "shutdown now" while this command is running
Actual results:
The worker MCP is permanently degraded with the error mentioned above We can see this log in the MCD I0625 13:15:47.473923 2666 image_manager_helper.go:92] Running captured: systemctl list-units --state=failed --no-legend I0625 13:15:47.481529 2666 daemon.go:1827] systemd service state: OK I0625 13:15:47.481571 2666 daemon.go:1380] Starting MachineConfigDaemon I0625 13:15:47.481626 2666 daemon.go:1387] Enabling Kubelet Healthz Monitor I0625 13:15:47.521512 2666 daemon.go:2979] Found 0 requested local packages in the booted deployment I0625 13:15:48.357928 2666 daemon.go:670] Node ip-10-0-83-147.us-east-2.compute.internal is not labeled node-role.kubernetes.io/master I0625 13:15:48.360434 2666 daemon.go:2032] Running: /run/machine-config-daemon-bin/nmstatectl persist-nic-names --root / --kargs-out /tmp/nmstate-kargs229921083 --cleanup [2025-06-25T13:15:48Z INFO nmstatectl] Nmstate version: 2.2.45 [2025-06-25T13:15:48Z INFO nmstatectl::persist_nic] /etc/systemd/network does not exist, no need to clean up I0625 13:15:48.389643 2666 node.go:23] No machineconfiguration.openshift.io/currentConfig annotation on node ip-10-0-83-147.us-east-2.compute.internal: map[cloud.network.openshift.io/egress-ipconfig:[{"interface":"eni-0464701685a8a3c8f","ifaddr":{"ipv4":"10.0.64.0/19"},"capacity":{"ipv4":14,"ipv6":15}}] k8s.ovn.org/node-gateway-router-lrp-ifaddrs:{"default":{"ipv4":"100.64.0.16/16"}} k8s.ovn.org/node-id:16 k8s.ovn.org/node-subnets:{"default":["10.130.6.0/23"]} k8s.ovn.org/node-transit-switch-port-ifaddr:{"ipv4":"100.88.0.16/16"} machine.openshift.io/machine:openshift-machine-api/sregidor-v19-cpjc4-worker-us-east-2c-xshr7 machineconfiguration.openshift.io/controlPlaneTopology:HighlyAvailable volumes.kubernetes.io/controller-managed-attach-detach:true], in cluster bootstrap, loading initial node annotation from /etc/machine-config-daemon/node-annotations.json I0625 13:15:48.390405 2666 node.go:52] Setting initial node config: rendered-worker-ac10cd07b163bf86670395119e135489 I0625 13:15:48.402032 2666 daemon.go:1682] In bootstrap mode I0625 13:15:48.408802 2666 daemon.go:1620] Previous boot ostree-finalize-staged.service appears successful I0625 13:15:48.408850 2666 daemon.go:1755] Current+desired config: rendered-worker-ac10cd07b163bf86670395119e135489 I0625 13:15:48.408870 2666 daemon.go:1770] state: Done I0625 13:15:48.408902 2666 update.go:2741] Running: rpm-ostree cleanup -r Bootloader updated; bootconfig swap: yes; bootversion: boot.1.1, deployment count change: -1 E0625 13:15:58.244754 2666 writer.go:231] Marking Degraded due to: "failed to remove rollback: error running rpm-ostree cleanup -r: : signal: terminated" I0625 13:15:58.282753 2666 daemon.go:759] Transitioned from state: Done -> Degraded I0625 13:15:58.282772 2666 daemon.go:762] Transitioned from degraded/unreconcilable reason -> failed to remove rollback: error running rpm-ostree cleanup -r: : signal: terminated I0625 13:15:58.286390 2666 daemon.go:2032] Running: /run/machine-config-daemon-bin/nmstatectl persist-nic-names --root / --kargs-out /tmp/nmstate-kargs4193065622 --cleanup
Expected results:
No degradation should happen
Additional info:
I haven't tried with other rpm-ostree commands. Maybe we get a degraded status when the node is shutdown while MCO is running any rpm-ostree command, but I'm not sure, I've only checked the cleanup command. Slack conversation: https://redhat-internal.slack.com/archives/GH7G2MANS/p1750857107077139