Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-58099

Node degraded with 'rpm-ostree cleanup -r: : signal: terminated"'

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 2
    • Low
    • None
    • None
    • None
    • None
    • MCO Sprint 276
    • 1
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      
      When a MCD is restarted it executes the rpm-ostree cleanup -r command. If the node is shut down while this command is executed the pool is degraded with this error
      
        - lastTransitionTime: "2025-06-25T13:15:58Z"
          message: 'Node ip-10-0-83-147.us-east-2.compute.internal is reporting: "failed
            to remove rollback: error running rpm-ostree cleanup -r: : signal: terminated"'
          reason: 1 nodes are reporting degraded status on sync
          status: "True"
          type: NodeDegraded
      
      
      

      Version-Release number of selected component (if applicable):

      4.19.0-0.nightly-2025-06-24-180820
      

      How reproducible:

      Always
      

      Steps to Reproduce:

      1. Scale up a node
      
      2. When the node just joins watch the log until we see this line
      
      I0625 13:15:48.408802    2666 daemon.go:1620] Previous boot ostree-finalize-staged.service appears successful
      I0625 13:15:48.408850    2666 daemon.go:1755] Current+desired config: rendered-worker-ac10cd07b163bf86670395119e135489
      I0625 13:15:48.408870    2666 daemon.go:1770] state: Done
      I0625 13:15:48.408902    2666 update.go:2741] Running: rpm-ostree cleanup -r
      
      
      3. Shutdown the node with "shutdown now" while this command is running
      
      

      Actual results:

      The worker MCP is permanently degraded with the error mentioned above
      
      We can see this log in the MCD
      
      
      I0625 13:15:47.473923    2666 image_manager_helper.go:92] Running captured: systemctl list-units --state=failed --no-legend
      I0625 13:15:47.481529    2666 daemon.go:1827] systemd service state: OK
      I0625 13:15:47.481571    2666 daemon.go:1380] Starting MachineConfigDaemon
      I0625 13:15:47.481626    2666 daemon.go:1387] Enabling Kubelet Healthz Monitor
      I0625 13:15:47.521512    2666 daemon.go:2979] Found 0 requested local packages in the booted deployment
      I0625 13:15:48.357928    2666 daemon.go:670] Node ip-10-0-83-147.us-east-2.compute.internal is not labeled node-role.kubernetes.io/master
      I0625 13:15:48.360434    2666 daemon.go:2032] Running: /run/machine-config-daemon-bin/nmstatectl persist-nic-names --root / --kargs-out /tmp/nmstate-kargs229921083 --cleanup
      [2025-06-25T13:15:48Z INFO  nmstatectl] Nmstate version: 2.2.45
      [2025-06-25T13:15:48Z INFO  nmstatectl::persist_nic] /etc/systemd/network does not exist, no need to clean up
      
      I0625 13:15:48.389643    2666 node.go:23] No machineconfiguration.openshift.io/currentConfig annotation on node ip-10-0-83-147.us-east-2.compute.internal: map[cloud.network.openshift.io/egress-ipconfig:[{"interface":"eni-0464701685a8a3c8f","ifaddr":{"ipv4":"10.0.64.0/19"},"capacity":{"ipv4":14,"ipv6":15}}] k8s.ovn.org/node-gateway-router-lrp-ifaddrs:{"default":{"ipv4":"100.64.0.16/16"}} k8s.ovn.org/node-id:16 k8s.ovn.org/node-subnets:{"default":["10.130.6.0/23"]} k8s.ovn.org/node-transit-switch-port-ifaddr:{"ipv4":"100.88.0.16/16"} machine.openshift.io/machine:openshift-machine-api/sregidor-v19-cpjc4-worker-us-east-2c-xshr7 machineconfiguration.openshift.io/controlPlaneTopology:HighlyAvailable volumes.kubernetes.io/controller-managed-attach-detach:true], in cluster bootstrap, loading initial node annotation from /etc/machine-config-daemon/node-annotations.json
      I0625 13:15:48.390405    2666 node.go:52] Setting initial node config: rendered-worker-ac10cd07b163bf86670395119e135489
      I0625 13:15:48.402032    2666 daemon.go:1682] In bootstrap mode
      I0625 13:15:48.408802    2666 daemon.go:1620] Previous boot ostree-finalize-staged.service appears successful
      I0625 13:15:48.408850    2666 daemon.go:1755] Current+desired config: rendered-worker-ac10cd07b163bf86670395119e135489
      I0625 13:15:48.408870    2666 daemon.go:1770] state: Done
      I0625 13:15:48.408902    2666 update.go:2741] Running: rpm-ostree cleanup -r
      Bootloader updated; bootconfig swap: yes; bootversion: boot.1.1, deployment count change: -1
      E0625 13:15:58.244754    2666 writer.go:231] Marking Degraded due to: "failed to remove rollback: error running rpm-ostree cleanup -r: : signal: terminated"
      I0625 13:15:58.282753    2666 daemon.go:759] Transitioned from state: Done -> Degraded
      I0625 13:15:58.282772    2666 daemon.go:762] Transitioned from degraded/unreconcilable reason  -> failed to remove rollback: error running rpm-ostree cleanup -r: : signal: terminated
      I0625 13:15:58.286390    2666 daemon.go:2032] Running: /run/machine-config-daemon-bin/nmstatectl persist-nic-names --root / --kargs-out /tmp/nmstate-kargs4193065622 --cleanup
      
      
      

      Expected results:

      No degradation should happen
      

      Additional info:

      
      I haven't tried with other rpm-ostree commands. Maybe we get a degraded status when the node is shutdown while MCO is running any rpm-ostree command, but I'm not sure, I've only checked the cleanup command.
      
      Slack conversation:
      https://redhat-internal.slack.com/archives/GH7G2MANS/p1750857107077139
      
      

              rh-ee-ijanssen Isabella Janssen
              sregidor@redhat.com Sergio Regidor de la Rosa
              None
              None
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: