-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.18
-
Quality / Stability / Reliability
-
False
-
-
5
-
Moderate
-
None
-
None
-
None
-
None
-
In Progress
-
Release Note Not Required
-
N/A
-
None
-
None
-
None
-
None
Description of problem:
When the reboot process is broken a MCDRebootError alert should be raised. Nevertheless, the alert is not raise, and the mcp is degraded with a wrong message
E1028 17:22:38.515751 45330 writer.go:226] Marking Degraded due to: failed to update OS to quay.io/mcoqe/layering@sha256:c56f19230be27cbc595d9467bcbc227858e097964ac5e5e7e74c5242aaca61e3: error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/mcoqe/layering@sha256:c56f19230be27cbc595d9467bcbc227858e097964ac5e5e7e74c5242aaca61e3: error: Old and new refs are equal: ostree-unverified-registry:quay.io/mcoqe/layering@sha256:c56f19230be27cbc595d9467bcbc227858e097964ac5e5e7e74c5242aaca61e3
If the reboot process is fixed the node cannot be recovered and remains stuck reporting the " Old and new refs are equal" error.
Version-Release number of selected component (if applicable):
IPI on AWS:
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.18.0-0.nightly-2024-10-28-052434 True False 8h Error while reconciling 4.18.0-0.nightly-2024-10-28-052434: an unknown error has occurred: MultipleErrors
How reproducible:
Always
Steps to Reproduce:
1. Enable OCL
2. Break the reboot
$ oc debug node/$(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[0].metadata.name}") -- chroot /host sh -c "mount -o remount,rw /usr; mv /usr/bin/systemd-run /usr/bin/systemd-run2"
Starting pod/sregidor-ver1-w48rv-worker-a-rln2vcopenshift-qeinternal-debug ...
To use host binaries, run `chroot /host`
3. Wait for a MCDRebootError to be raised and check that the MCP is degraded with message: "reboot command failed, something is seriously wrong"'
Actual results:
The MCDRebootError alert is not raised and the MCP is degraded with the wrong message
- lastTransitionTime: "2024-10-28T16:40:43Z"
message: 'Node ip-10-0-51-0.us-east-2.compute.internal is reporting: "failed to
update OS to quay.io/mcoqe/layering@sha256:c56f19230be27cbc595d9467bcbc227858e097964ac5e5e7e74c5242aaca61e3:
error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/mcoqe/layering@sha256:c56f19230be27cbc595d9467bcbc227858e097964ac5e5e7e74c5242aaca61e3:
error: Old and new refs are equal: ostree-unverified-registry:quay.io/mcoqe/layering@sha256:c56f19230be27cbc595d9467bcbc227858e097964ac5e5e7e74c5242aaca61e3\n:
exit status 1"'
reason: 1 nodes are reporting degraded status on sync
status: "True"
type: NodeDegraded
Expected results:
The alert should be raised and the mcp should be degraded with the right message
Additional info:
If OCL is disabled this functionality works as expected.
- links to
-
RHEA-2024:11038
OpenShift Container Platform 4.19.z bug fix update