Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-5497

MCDRebootError alarm disappears after 15 minutes

XMLWordPrintable

      Description of problem:

      When there is a problem while rebooting a node, a MCDRebootError alarm is risen. This alarm disappears after 15 minutes, even if the machine was not rebooted.
      
      

      Version-Release number of selected component (if applicable):

      $ oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.13.0-0.nightly-2022-12-22-120609   True        False         26m     Cluster version is 4.13.0-0.nightly-2022-12-22-120609
      

      How reproducible:

      Always
      

      Steps to Reproduce:

      1. Execute these commands in a worker node in order to break the reboot process.
      
      $ mount -o remount,rw /usr
      $ mv /usr/bin/systemd-run /usr/bin/systemd-run2
      
      2. Creat any MC. For example, this one:
      
      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        labels:
          machineconfiguration.openshift.io/role: worker
        name: test-file
      spec:
        config:
          ignition:
            version: 3.1.0
          storage:
            files:
            - contents:
                source: data:text/plain;charset=utf;base64,c2VydmVyIGZvby5leGFtcGxlLm5ldCBtYXhkZWxheSAwLjQgb2ZmbGluZQpzZXJ2ZXIgYmFyLmV4YW1wbGUubmV0IG1heGRlbGF5IDAuNCBvZmZsaW5lCnNlcnZlciBiYXouZXhhbXBsZS5uZXQgbWF4ZGVsYXkgMC40IG9mZmxpbmUK
              filesystem: root
              mode: 0644
              path: /etc/test
      
      
      

      Actual results:

      A MCDRebootError alarm is triggered. But after 15 minutes this alarm disappears.
      
      

      Expected results:

      The alarm should not disappear after 15 minutes. It should remain there until the node is rebooted.
      
      
      

      Additional info:

      This is the PR that seems to introduce this behavior
      https://github.com/openshift/machine-config-operator/pull/3406#discussion_r1030481908
      
      
      

       

            djoshy David Joshy
            sregidor@redhat.com Sergio Regidor de la Rosa
            Sergio Regidor de la Rosa Sergio Regidor de la Rosa
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: