Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-8068

The node's last_error disappears briefly on cleaning failure

XMLWordPrintable

    • Important
    • No
    • 2
    • Metal Platform 232, Metal Platform 233, Metal Platform 234, Metal Platform 235
    • 4
    • Rejected
    • False
    • Hide

      None

      Show
      None

      It is caused by the power off routine, which initialises last_error to None. The field is later restored, but BMO manages to observe and record the wrong value.

      This issue is not trivial to reproduce in the product. You need OCPBUGS-2471 to land first, then you need to trigger the cleaning failure several times. I used direct access to Ironic via CLI to abort cleaning (`baremetal node abort <node name>`) during deprovisioning. After a few attempts you can observe the following in the BMH's status:

      status:
        errorCount: 2
        errorMessage: 'Cleaning failed: '
        errorType: provisioning error

      The empty message after the colon is a sign of this bug.

              rhn-engineering-dtantsur Dmitry Tantsur
              rhn-engineering-dtantsur Dmitry Tantsur
              Jad Haj Yahya Jad Haj Yahya
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: