Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-59259

Firmware upgrade failures get cleared from the BaremetalHost making it difficult to know what failed

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 3
    • 3
    • Critical
    • None
    • None
    • None
    • None
    • Metal Platform 274, Metal Platform 275, Metal Platform 276
    • 3
    • In Progress
    • Release Note Not Required
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      I have the following BMH definition:

      apiVersion: metal3.io/v1alpha1
      kind: BareMetalHost
      metadata:
        name: dell-server
        namespace: hardware-inventory
        annotations:
      spec:
        automatedCleaningMode: disabled
        bmc:
          disableCertificateVerification: True
          address: idrac-virtualmedia://10.6.75.116/redfish/v1/Systems/System.Embedded.1
          credentialsName: dell-bmc-credentials
        bootMACAddress: B4:83:51:00:B4:88
        online: true

      And I apply the following manifest to run the upgrade:

      ---
      apiVersion: metal3.io/v1alpha1
      kind: HostUpdatePolicy
      metadata:
        name: dell-server
        namespace: hardware-inventory
      spec:
        firmwareSettings: onReboot
        firmwareUpdates: onReboot
      ---
      apiVersion: metal3.io/v1alpha1
      kind: HostFirmwareComponents
      metadata:
        name: dell-server
        namespace: hardware-inventory
      spec:
        updates:
          - component: bmc
            url: http://10.6.116.4:9999/iDRAC-with-Lifecycle-Controller_Firmware_R8V2F_LN64_7.20.30.00_A00.BIN
          - component: bios
            url: http://10.6.116.4:9999/BIOS_0HY8N_LN64_1.17.2.BIN 

      Then I apply the reboot.metal3.io="" annotation to get the firmware and bios upgraded. The problem I see is that this upgrade fails, this is what I see in the BMO logs:

      {"level":"info","ts":1750857141.5618482,"logger":"controllers.BareMetalHost","msg":"using PreprovisioningImage","baremetalhost":{"name":"dell-server","namespace":"hardware-inventory"},"provisioningState":"available","Image":{"ImageURL":"http://metal3-image-customization-service.openshift-machine-api.svc.cluster.local/d235e339-76f8-4709-ad13-3517db75f539","KernelURL":"","ExtraKernelParams":"","Format":"iso"}}
      {"level":"info","ts":1750857141.5847816,"logger":"provisioner.ironic","msg":"current provision state","host":"hardware-inventory~dell-server","lastError":"Firmware update failed for node c0471e4f-e9c8-4678-9893-77975de4ded2, firmware http://10.6.116.4:9999/iDRAC-with-Lifecycle-Controller_Firmware_R8V2F_LN64_7.20.30.00_A00.BIN. Error: Lifecycle Controller in use. This job will start when Lifecycle Controller is available.","current":"manageable","target":""} 

      Now, that's okay. The issue I see is that this error will be cleared from the BMH, this is the BMH during the upgrade:

      NAME          STATE       CONSUMER   ONLINE   ERROR   AGE
      dell-server   preparing              true             46h
      dell-server   preparing              true     preparation error   46h
      dell-server   preparing              true     preparation error   46h
      dell-server   available              true                         46h 

       

       

      Version-Release number of selected component (if applicable):

      OCP 4.18    

      How reproducible:

      Always

      Steps to Reproduce:

      Described above.

      Actual results:

      Firmware upgrade fails. Error gets cleared out from the BMH. User cannot see why it failed unless watching the object during the procedure.

      Expected results:

      Firmware upgrade fails. Error remains in the BMH until the user fixes whatever needs to be fixed.

      Additional info:

      Slack thread: https://redhat-internal.slack.com/archives/CFP6ST0A3/p1750857275820059

              imelofer Iury Gregory Melo Ferreira
              mavazque@redhat.com Mario Vazquez Cebrian
              None
              None
              Jad Haj Yahya Jad Haj Yahya
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: