Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-8068

The node's last_error disappears briefly on cleaning failure

    XMLWordPrintable

Details

    • Important
    • No
    • 2
    • Metal Platform 232, Metal Platform 233, Metal Platform 234, Metal Platform 235
    • 4
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      It is caused by the power off routine, which initialises last_error to None. The field is later restored, but BMO manages to observe and record the wrong value.

      This issue is not trivial to reproduce in the product. You need OCPBUGS-2471 to land first, then you need to trigger the cleaning failure several times. I used direct access to Ironic via CLI to abort cleaning (`baremetal node abort <node name>`) during deprovisioning. After a few attempts you can observe the following in the BMH's status:

      status:
        errorCount: 2
        errorMessage: 'Cleaning failed: '
        errorType: provisioning error

      The empty message after the colon is a sign of this bug.

      Attachments

        Activity

          People

            rhn-engineering-dtantsur Dmitry Tantsur
            rhn-engineering-dtantsur Dmitry Tantsur
            Jad Haj Yahya Jad Haj Yahya
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: