Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: 4.19
Affects Version/s: 4.18
Component/s: Machine Config Operator
Labels:

Severity:
Moderate
Regression:
None
Epic Link:
Unified Update Interface
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.19

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

When the reboot process is broken a MCDRebootError alert should be raised. Nevertheless, the alert is not raise, and the mcp is degraded with a wrong message

E1028 17:22:38.515751   45330 writer.go:226] Marking Degraded due to: failed to update OS to quay.io/mcoqe/layering@sha256:c56f19230be27cbc595d9467bcbc227858e097964ac5e5e7e74c5242aaca61e3: error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/mcoqe/layering@sha256:c56f19230be27cbc595d9467bcbc227858e097964ac5e5e7e74c5242aaca61e3: error: Old and new refs are equal: ostree-unverified-registry:quay.io/mcoqe/layering@sha256:c56f19230be27cbc595d9467bcbc227858e097964ac5e5e7e74c5242aaca61e3

If the reboot process is fixed the node cannot be recovered and remains stuck reporting the " Old and new refs are equal" error.

Version-Release number of selected component (if applicable):

IPI on AWS:
$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.18.0-0.nightly-2024-10-28-052434   True        False         8h      Error while reconciling 4.18.0-0.nightly-2024-10-28-052434: an unknown error has occurred: MultipleErrors

How reproducible:

Always

Steps to Reproduce:

    1. Enable OCL
    2. Break the reboot

$ oc debug  node/$(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[0].metadata.name}") -- chroot /host sh -c "mount -o remount,rw /usr; mv /usr/bin/systemd-run /usr/bin/systemd-run2"
Starting pod/sregidor-ver1-w48rv-worker-a-rln2vcopenshift-qeinternal-debug ...
To use host binaries, run `chroot /host`

    3. Wait for a     MCDRebootError to be raised and check that the MCP is degraded with message: "reboot command failed, something is seriously wrong"'

Actual results:


   The MCDRebootError alert is not raised and the MCP is degraded with the wrong message

  - lastTransitionTime: "2024-10-28T16:40:43Z"
    message: 'Node ip-10-0-51-0.us-east-2.compute.internal is reporting: "failed to
      update OS to quay.io/mcoqe/layering@sha256:c56f19230be27cbc595d9467bcbc227858e097964ac5e5e7e74c5242aaca61e3:
      error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/mcoqe/layering@sha256:c56f19230be27cbc595d9467bcbc227858e097964ac5e5e7e74c5242aaca61e3:
      error: Old and new refs are equal: ostree-unverified-registry:quay.io/mcoqe/layering@sha256:c56f19230be27cbc595d9467bcbc227858e097964ac5e5e7e74c5242aaca61e3\n:
      exit status 1"'
    reason: 1 nodes are reporting degraded status on sync
    status: "True"
    type: NodeDegraded

Expected results:

   The alert should be raised and the mcp should be degraded with the right message

Additional info:

    If OCL is disabled this functionality works as expected.

links to

openshift/machine-config-operator#4811: OCPBUGS-43896: Ensure that build jobs are always reconciled

openshift/machine-config-operator#4825: OCPBUGS-43896: add revert logic to OCL path in MCD

Assignee:: Zack Zlotnik

Reporter:: Sergio Regidor de la Rosa

QA Contact:: Sergio Regidor de la Rosa

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2024/10/28 5:33 PM

Updated:: 2025/03/12 3:06 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates