-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.18
-
Moderate
-
None
-
False
-
Description of problem:
When the reboot process is broken a MCDRebootError alert should be raised. Nevertheless, the alert is not raise, and the mcp is degraded with a wrong message E1028 17:22:38.515751 45330 writer.go:226] Marking Degraded due to: failed to update OS to quay.io/mcoqe/layering@sha256:c56f19230be27cbc595d9467bcbc227858e097964ac5e5e7e74c5242aaca61e3: error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/mcoqe/layering@sha256:c56f19230be27cbc595d9467bcbc227858e097964ac5e5e7e74c5242aaca61e3: error: Old and new refs are equal: ostree-unverified-registry:quay.io/mcoqe/layering@sha256:c56f19230be27cbc595d9467bcbc227858e097964ac5e5e7e74c5242aaca61e3 If the reboot process is fixed the node cannot be recovered and remains stuck reporting the " Old and new refs are equal" error.
Version-Release number of selected component (if applicable):
IPI on AWS: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.18.0-0.nightly-2024-10-28-052434 True False 8h Error while reconciling 4.18.0-0.nightly-2024-10-28-052434: an unknown error has occurred: MultipleErrors
How reproducible:
Always
Steps to Reproduce:
1. Enable OCL 2. Break the reboot $ oc debug node/$(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[0].metadata.name}") -- chroot /host sh -c "mount -o remount,rw /usr; mv /usr/bin/systemd-run /usr/bin/systemd-run2" Starting pod/sregidor-ver1-w48rv-worker-a-rln2vcopenshift-qeinternal-debug ... To use host binaries, run `chroot /host` 3. Wait for a MCDRebootError to be raised and check that the MCP is degraded with message: "reboot command failed, something is seriously wrong"'
Actual results:
The MCDRebootError alert is not raised and the MCP is degraded with the wrong message - lastTransitionTime: "2024-10-28T16:40:43Z" message: 'Node ip-10-0-51-0.us-east-2.compute.internal is reporting: "failed to update OS to quay.io/mcoqe/layering@sha256:c56f19230be27cbc595d9467bcbc227858e097964ac5e5e7e74c5242aaca61e3: error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/mcoqe/layering@sha256:c56f19230be27cbc595d9467bcbc227858e097964ac5e5e7e74c5242aaca61e3: error: Old and new refs are equal: ostree-unverified-registry:quay.io/mcoqe/layering@sha256:c56f19230be27cbc595d9467bcbc227858e097964ac5e5e7e74c5242aaca61e3\n: exit status 1"' reason: 1 nodes are reporting degraded status on sync status: "True" type: NodeDegraded
Expected results:
The alert should be raised and the mcp should be degraded with the right message
Additional info:
If OCL is disabled this functionality works as expected.