Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-56056

Day2 firmware update on HPE fails when trying to power on server

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Attempts to run day2 update on an HPE ProLiant DL110 Gen11, provisioned with assisted-service, fails in the final stages with the following error in the BMH:
      
        errorMessage: 'Node 845a8264-7970-414e-a2b2-5b292bedee0d failed step {''args'':
          {''settings'': [{''component'': ''bios'', ''url'': ''http://hv14.telco5gran.eng.rdu2.redhat.com:8888/firmware/hpe/BIOS-U62_1.48_10_05_2023.fwpkg''}]},
          ''interface'': ''firmware'', ''step'': ''update'', ''abortable'': False, ''priority'':
          0}: Failed to set node power state to power on.'
        errorType: servicing error
      
      The BIOS firmware upgrade itself is successful, but the final reboot does not happen, reporting this error.
      
      
      2025-05-12 16:50:50.743 1 DEBUG sushy.connector [-] HTTP request: GET https://10.6.36.56/redfish/v1/Systems/1; headers: {'OData-Version': '4.0', 'Accept-Encoding': 'identity'}; body: None; blocking: False; timeout: 60; session arguments: {}; _op /usr/lib/python3.9/site-packages/sushy/connector.py:157
      2025-05-12 16:50:51.054 1 DEBUG sushy.connector [-] HTTP response for GET https://10.6.36.56/redfish/v1/Systems/1: status code: 200 _op /usr/lib/python3.9/site-packages/sushy/connector.py:306                                                                                                                                                                                       2025-05-12 16:50:51.055 1 DEBUG sushy.resources.base [-] Received representation of System /redfish/v1/Systems/1: {'_actions': {'reset': {'allowed_values': ['On', 'ForceOff', 'GracefulShutdown', 'ForceRestart', 'Nmi', 'PushPowerButton', 'GracefulRestart'], 'operation_apply_time_support': None, 'target_uri': '/redfish/v1/Systems/1/Actions/ComputerSystem.Reset'}}, '_oem_vendors': ['Hpe'], '_settings': None, 'asset_tag': '', 'bios_version': 'U62 v1.48 (10/05/2023)', 'boot': {'allowed_values': ['None', 'Cd', 'Hdd', 'Usb', 'SDCard', 'Utilities', 'Diags', 'BiosSetup', 'Pxe', 'UefiShell', 'UefiHttp', 'UefiTarget'], 'enabled': <BootSourceOverrideEnabled.ONCE: 'Once'>, 'http_boot_uri': None, 'mode': <BootSourceOverrideMode.UEFI: 'UEFI'>, 'target': <BootSource.CD: 'Cd'>}, 'boot_progress': {'last_boot_seconds_count': 0, 'last_state': <BootProgressStates.PRIMARY_PROCESSOR: 'PrimaryProcessorInitializationStarted'>, 'last_state_updated_at': None, 'oem_last_state': None}, 'description': None, 'hostname': 'cnfdg43', 'identity': '1', 'indicator_led': None, 'links': {'oem_vendors': None}, 'maintenance_window': None, 'manufacturer': 'HPE', 'memory_summary': {'health': 'OK', 'size_gib': 128}, 'name': 'Computer System', 'part_number': None, 'power_state': None, 'serial_number': '2M2D0V03V2', 'sku': 'P54277-B21', 'status': {'health': <Health.OK: 'OK'>, 'health_rollup': <Health.OK: 'OK'>, 'state': <State.STARTING: 'Starting'>}, 'system_type': <SystemType.PHYSICAL: 'Physical'>, 'uuid': '32343550-3737-4D32-3244-305630335632'} refresh /usr/lib/python3.9/site-packages/sushy/resources/base.py:697                                                                                        2025-05-12 16:50:51.056 1 ERROR oslo.service.loopingcall [-] Dynamic backoff interval looping call 'ironic.conductor.utils.node_wait_for_power_state.<locals>._wait' failed: oslo_service.loopingcall.LoopingCallTimeOut: Looping call timed out after 161.11 seconds
      2025-05-12 16:50:51.056 1 ERROR oslo.service.loopingcall Traceback (most recent call last):
      2025-05-12 16:50:51.056 1 ERROR oslo.service.loopingcall   File "/usr/lib/python3.9/site-packages/oslo_service/loopingcall.py", line 154, in _run_loop
      2025-05-12 16:50:51.056 1 ERROR oslo.service.loopingcall     idle = idle_for_func(result, self._elapsed(watch))
      2025-05-12 16:50:51.056 1 ERROR oslo.service.loopingcall   File "/usr/lib/python3.9/site-packages/oslo_service/loopingcall.py", line 349, in _idle_for
      2025-05-12 16:50:51.056 1 ERROR oslo.service.loopingcall     raise LoopingCallTimeOut(
      2025-05-12 16:50:51.056 1 ERROR oslo.service.loopingcall oslo_service.loopingcall.LoopingCallTimeOut: Looping call timed out after 161.11 seconds
      2025-05-12 16:50:51.056 1 ERROR oslo.service.loopingcall                                                                                                                                   2025-05-12 16:50:51.058 1 ERROR ironic.conductor.utils [None req-19544a4a-fbd0-46c2-bacf-53a516ca768e - - - - - -] Timed out after 120 secs waiting for power on on node 845a8264-7970-414e-a2b2-5b292bedee0d.: oslo_service.loopingcall.LoopingCallTimeOut: Looping call timed out after 161.11 seconds                                                                              2025-05-12 16:50:51.072 1 ERROR ironic.drivers.utils [None req-19544a4a-fbd0-46c2-bacf-53a516ca768e - - - - - -] Failed to invoke collect_system_logs agent command for node 845a8264-7970-414e-a2b2-5b292bedee0d. Error: Agent driver requires agent_url in driver_internal_info: ironic.common.exception.AgentConnectionFailed: Agent driver requires agent_url in driver_internal_info
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils [None req-19544a4a-fbd0-46c2-bacf-53a516ca768e - - - - - -] Node 845a8264-7970-414e-a2b2-5b292bedee0d failed step {'args': {'settings': [{'component': 'bios', 'url': 'http://hv14.telco5gran.eng.rdu2.redhat.com:8888/firmware/hpe/BIOS-U62_1.48_10_05_2023.fwpkg'}]}, 'interface': 'firmware', 'step': 'update', 'abortable': False, 'priority': 0}: Failed to set node power state to power on.: ironic.common.exception.PowerStateFailure: Failed to set node power state to power on.
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils Traceback (most recent call last):
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils   File "/usr/lib/python3.9/site-packages/ironic/conductor/utils.py", line 184, in node_wait_for_power_state
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils     return timer.start(initial_delay=1, timeout=retry_timeout).wait()
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils   File "/usr/lib/python3.9/site-packages/eventlet/event.py", line 125, in wait
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils     result = hub.switch()
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils   File "/usr/lib/python3.9/site-packages/eventlet/hubs/hub.py", line 313, in switch
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils     return self.greenlet.switch()
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils   File "/usr/lib/python3.9/site-packages/oslo_service/loopingcall.py", line 154, in _run_loop
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils     idle = idle_for_func(result, self._elapsed(watch))
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils   File "/usr/lib/python3.9/site-packages/oslo_service/loopingcall.py", line 349, in _idle_for
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils     raise LoopingCallTimeOut(
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils oslo_service.loopingcall.LoopingCallTimeOut: Looping call timed out after 161.11 seconds
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils During handling of the above exception, another exception occurred:
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils Traceback (most recent call last):
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils   File "/usr/lib/python3.9/site-packages/ironic/conductor/servicing.py", line 151, in do_next_service_step
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils     result = interface.execute_service_step(task, step)
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils   File "/usr/lib/python3.9/site-packages/ironic/drivers/base.py", line 456, in execute_service_step
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils     return self._execute_step(task, step)
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils   File "/usr/lib/python3.9/site-packages/ironic/drivers/base.py", line 319, in _execute_step
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils     return getattr(self, step['step'])(task, **args)
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils   File "/usr/lib/python3.9/site-packages/ironic_lib/metrics.py", line 60, in wrapped
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils     result = f(*args, **kwargs)
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils   File "/usr/lib/python3.9/site-packages/ironic/drivers/base.py", line 1887, in wrapped
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils     result = func(self, task, *args, **kwargs)
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils   File "/usr/lib/python3.9/site-packages/ironic/drivers/modules/redfish/firmware.py", line 184, in update
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils     return deploy_utils.reboot_to_finish_step(task, timeout=wait_interval)
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils   File "/usr/lib/python3.9/site-packages/ironic/drivers/modules/deploy_utils.py", line 1661, in reboot_to_finish_step
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils     manager_utils.node_power_action(task, states.REBOOT, timeout)
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils   File "/usr/lib/python3.9/site-packages/ironic/conductor/task_manager.py", line 153, in wrapper
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils     return f(*args, **kwargs)
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils   File "/usr/lib/python3.9/site-packages/ironic/conductor/utils.py", line 356, in node_power_action
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils     notify_utils.emit_power_set_notification(
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils   File "/usr/lib/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils     self.force_reraise()
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils   File "/usr/lib/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils     raise self.value
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils   File "/usr/lib/python3.9/site-packages/ironic/conductor/utils.py", line 344, in node_power_action
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils     task.driver.power.reboot(task, timeout=timeout)
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils   File "/usr/lib/python3.9/site-packages/ironic/conductor/task_manager.py", line 153, in wrapper
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils     return f(*args, **kwargs)
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils   File "/usr/lib/python3.9/site-packages/ironic/drivers/modules/redfish/power.py", line 163, in reboot
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils     _set_power_state(task, system, next_state, timeout=timeout)
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils   File "/usr/lib/python3.9/site-packages/ironic/drivers/modules/redfish/power.py", line 71, in _set_power_state
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils     cond_utils.node_wait_for_power_state(task, target_state,
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils   File "/usr/lib/python3.9/site-packages/ironic/conductor/utils.py", line 190, in node_wait_for_power_state
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils     raise exception.PowerStateFailure(pstate=new_state)
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils ironic.common.exception.PowerStateFailure: Failed to set node power state to power on.
      2025-05-12 16:50:51.080 1 ERROR ironic.conductor.utils
      2025-05-12 16:50:51.081 1 DEBUG ironic.conductor.utils [None req-19544a4a-fbd0-46c2-bacf-53a516ca768e - - - - - -] Node 845a8264-7970-414e-a2b2-5b292bedee0d is not fast-track-able because it has an error: Failed to change power state to 'power on' by 'rebooting': Failed to set node power state to power on. is_fast_track /usr/lib/python3.9/site-packages/ironic/conductor/utils.py:1221                                                                                                                                                                                2025-05-12 16:50:51.081 1 DEBUG ironic.drivers.modules.redfish.boot [None req-19544a4a-fbd0-46c2-bacf-53a516ca768e - - - - - -] Cleaning up deploy boot for 845a8264-7970-414e-a2b2-5b292bedee0d clean_up_ramdisk /usr/lib/python3.9/site-packages/ironic/drivers/modules/redfish/boot.py:813                                                                                         
      
          

      Version-Release number of selected component (if applicable):

      OCP: 4.18.10
      MCE: 2.8.1
      
      HPE ILO: iLO 6 v1.68 / Apr 24 2025
          

      How reproducible:

      The HPE BIOS firmware upgrade seems to succeed prior to provisioning, but once provisioned and the preprovisioningImage is provided by assisted-service, this failure occurs almost every time.
          

      Steps to Reproduce:

          1. Custom build of assisted-service is required with support for day2-config annotation, in order to drop the detached/paused annotations
          2. Provision cluster with o-cloud manager / siteconfig / assisted-service
          3. Run day2 firmware update procedure
          

      Actual results:

      
          

      Expected results:

      
          

      Additional info:

      
          

              imelofer Iury Gregory Melo Ferreira
              dpenney1@redhat.com Don Penney
              None
              None
              Jad Haj Yahya Jad Haj Yahya
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: