Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-62010

BMC F/W update fails on Dell R740, server stuck in power off

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 2
    • Critical
    • No
    • x86_64
    • None
    • None
    • Rejected
    • None
    • Proposed
    • Known Issue
    • Hide
      *Cause*: the user action or circumstances that trigger the bug
      *Consequence*: what the user experience is when the bug occurs
      *Workaround*: if available
      *Result*: mandatory if the workaround does not solve the problem completely
      Show
      *Cause*: the user action or circumstances that trigger the bug *Consequence*: what the user experience is when the bug occurs *Workaround*: if available *Result*: mandatory if the workaround does not solve the problem completely
    • None
    • None
    • None
    • None

      Description of problem:

      Follow on from OCPBUGS-62009.

      I deleted the BMH and restarted the metal3 pods to recover. 

      I then reprovisioned the BMH and waited for inspection to finish. 

      I then updated the hostfirmwarecomponents to just update the BMH. 

      It actually did update the firmware on the server but metal3 failed to recover the server.

      apiVersion: metal3.io/v1alpha1
      kind: HostFirmwareComponents
      metadata:
        creationTimestamp: "2025-09-21T23:12:18Z"
        generation: 2
        name: r740xdg1
        namespace: r740-pool
        ownerReferences:
        - apiVersion: metal3.io/v1alpha1
          kind: BareMetalHost
          name: r740xdg1
          uid: 9494c650-ad12-4a75-a304-3e9a91b90a1c
        resourceVersion: "18510819"
        uid: d1077d36-c7cd-4647-a4d1-8c7eab631e5e
      spec:
        updates:
        - component: bmc
          url: http://hv14.telco5gran.eng.rdu2.redhat.com:8888/firmware/r740/iDRAC-with-Lifecycle-Controller_Firmware_XTFXJ_WN64_7.00.00.173_A00.EXE
      status:
        components:
        - component: bios
          currentVersion: 2.22.2
          initialVersion: 2.22.2
        - component: bmc
          currentVersion: 7.00.00.181
          initialVersion: 7.00.00.181
        - component: nic:NIC.Integrated.1
          currentVersion: 14.32.20.04
          initialVersion: 14.32.20.04
        conditions:
        - lastTransitionTime: "2025-09-21T23:12:19Z"
          message: ""
          observedGeneration: 2
          reason: OK
          status: "True"
          type: Valid
        - lastTransitionTime: "2025-09-21T23:26:16Z"
          message: ""
          observedGeneration: 2
          reason: OK
          status: "False"
          type: ChangeDetected
        lastUpdated: "2025-09-21T23:26:16Z"
        updates:
        - component: bmc
          url: http://hv14.telco5gran.eng.rdu2.redhat.com:8888/firmware/r740/iDRAC-with-Lifecycle-Controller_Firmware_XTFXJ_WN64_7.00.00.173_A00.EXE

      The ser ver is stuck in power off and is powered off again even if I manually power cycle it.

      The following was seen in the logs 

      {"level":"info","ts":1758498122.6741993,"logger":"provisioner.ironic","msg":"current provision state","host":"r740-pool~r740xdg1","lastError":"Node ea1c7224-e159-4201-84b0-c28cad19046b failed step {'args': {'settings': [

      {'component': 'bmc', 'url': 'http://hv14.telco5gran.eng.rdu2.redhat.com:8888/firmware/r740/iDRAC-with-Lifecycle-Controller_Firmware_XTFXJ_WN64_7.00.00.173_A00.EXE'}

      ]}, 'interface': 'firmware', 'step': 'update', 'abortable': False, 'priority': 0}: Redfish exception occurred. Error: HTTP POST https://10.6.36.10/redfish/v1/SessionService/Sessions returned code 401. Base.1.12.GeneralError: Unable to complete the operation because an invalid username and/or password is entered, and therefore authentication failed. Extended information: [{'Message': 'Unable to complete the operation because an invalid username and/or password is entered, and therefore authentication failed.', 'MessageArgs': [], 'MessageArgs@odata.count': 0, 'MessageId': 'IDRAC.2.8.SYS415', 'RelatedProperties': [], 'RelatedProperties@odata.count': 0, 'Resolution': 'Enter valid user name and password and retry the operation.', 'Severity': 'Warning'}]","current":"manageable","target":""}

      Credentials are obviously ok as we would have not gotten this far with invalid ones.

      BMC logs 

          2025-09-22 00:06:51     LOG007     The previous log entry was repeated 1 times.    
              2025-09-21 23:41:18     RAC0720     Unable to locate the ISO or IMG image file or folder in the network share location because the file or folder path or the user credentials entered are incorrect.    
              2025-09-21 23:41:17     USR0030     Successfully logged in using telemetry, from 127.0.0.1 and REDFISH.    
              2025-09-21 23:41:15     RAC0717     Remote share unmounted successfully.    
              2025-09-21 23:40:56     USR0031     Unable to log in for NULL from 10.22.88.128 using eHTML5 Virtual Console.    
              2025-09-21 23:40:56     LOG007     The previous log entry was repeated 1 times.    
              2025-09-21 23:40:49     USR0030     Successfully logged in using root, from 10.8.53.44 and REDFISH.    
              2025-09-21 23:40:37     DIS002     Auto Discovery feature disabled.    
              2025-09-21 23:40:37     RAC0182     The iDRAC firmware was rebooted with the following reason: user initiated.    
              2025-09-21 23:40:27     IPA0100     The iDRAC IP Address changed from :: to 2620:52:9:1624:f602:70ff:fee4:f7f4.    
              2025-09-21 23:40:27     IPA0100     The iDRAC IP Address changed from 0.0.0.0 to 10.6.36.10.    
              2025-09-21 23:40:07     PR36     Version change detected for Lifecycle Controller firmware. Previous version:7.00.00.181, Current version:7.00.00.173    
              2025-09-21 23:40:02     THRM0008     The UNC Warning threshold limit of the server board inlet temperature sensor is changed to 38.    
              2025-09-21 23:39:58     PSU0800     Power Supply 2: Status = 0x1, IOUT = 0x0, VOUT= 0x0, TEMP= 0x0, FAN = 0x0, INPUT= 0x0.    
              2025-09-21 23:39:58     PSU0800     Power Supply 1: Status = 0x1, IOUT = 0x0, VOUT= 0x0, TEMP= 0x0, FAN = 0x0, INPUT= 0x0.    
              2025-09-21 23:37:38     USR0032     The session for root from 10.8.53.44 using REDFISH is logged off.    
              2025-09-21 23:37:36     SYS1001     System is turning off.    
              2025-09-21 23:37:36     SYS1003     System CPU Resetting.    
              2025-09-21 23:37:29     JCP037     The (installation or configuration) job JID_585155450820 is successfully completed.    
              2025-09-21 23:37:29     RED063     The iDRAC firmware updated successfully. Previous version: 7.00.00.181, Current version: 7.00.00.173    
              2025-09-21 23:37:29     RAC0704     Requested system powerdown.    
              2025-09-21 23:37:26     SUP1906     Firmware update successful.    
              2025-09-21 23:36:34     SUP1905     Firmware update programming flash.    
              2025-09-21 23:36:18     SUP1903     Firmware update verify image headers.    
              2025-09-21 23:36:18     SUP1904     Firmware update checksumming image.    
              2025-09-21 23:36:17     SUP1911     Firmware update initialization complete.    
              2025-09-21 23:36:17     SUP1901     Firmware update initializing.    
              2025-09-21 23:34:34     USR0032     The session for root from 10.8.53.44 using REDFISH is logged off.    
              2025-09-21 23:33:23     RED002     Package successfully downloaded.    
              2025-09-21 23:32:46     RED111     Successfully downloaded the update package details 228.097 MB in 16.9493 secs at 13.4576 MBps (107.661 Mbps) [iDRAC-with-Lifecycle-Controller_Firmware_XTFXJ_WN64_7.00.00.173_A00.EXE].    
              2025-09-21 23:32:29     RED110     Downloading the iDRAC-with-Lifecycle-Controller_Firmware_XTFXJ_WN64_7.00.00.173_A00.EXE update package.    
              2025-09-21 23:32:25     JCP027     The (installation or configuration) job JID_585155450820 is successfully created on iDRAC.    
              2025-09-21 23:29:01     CTL129     The boot media of the Controller RAID Controller in Slot 6 is Disk.Virtual.0:RAID.Slot.6-1.    
              2025-09-21 23:26:06     SYS1003     System CPU Resetting.    
              2025-09-21 23:25:43     SYS1000     System is turning on.    
              2025-09-21 23:25:42     RAC0701     Requested system powerup.

      Must Gather: https://drive.google.com/file/d/1TlEg50A2NqIU8gYCBtwxg_MT1P6Jb8iu/view?usp=drive_link

      if this is the same root cause as  OCPBUGS-62009,  feel free to mark as a dup.

      Version-Release number of selected component (if applicable):

      4.20-rc.2 

      How reproducible:

      Have not had a BMC upgrade complete on rc2 yet 

      Steps to Reproduce:

      1.  
      2.  
      3. ...

      Actual results:

      Expected results:

      Additional info:

              janders@redhat.com Jacob Anders
              browsell@redhat.com Brent Rowsell
              Brent Rowsell
              None
              Jad Haj Yahya Jad Haj Yahya
              Srikanth R Srikanth R
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: