Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-77911

BMH stuck in`provisioning` state while provisioning NETAS NCS6722N4 systems

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      While installing a managed cluster using ACM ZTP approach with BMH details provided for hardware provisioning, some of the bmh gets stuck in `provisioning` state. The respective agent is stuck in `Rebooting` state. 
      
      The issue is observed for some, but not all nodes. The ISO does not get ejected from BMC. The nodes seem to boot from ISO again resulting into node being stuck with `installing-pending-user-action`.
      
      Manually ejecting the ISO form the BMC and reboot the node to fix the issue.
      
      Looking at the metal3 logs, I found 6 other nodes did fail to eject the ISO, but these nodes are fine and did not have issues. Following is the error for one of them. 
      
      ~~~
      2026-02-17T15:39:56.489996714Z 2026-02-17 15:39:56.489 1 DEBUG sushy.exceptions [None req-a2e3bef6-05d3-4be9-917a-12e0931d035b - - - - - -] HTTP response for POST https://10.141.228.34/redfish/v1/Managers/1/VirtualMedia/CD/Actions/VirtualMedia.EjectMedia: status code: 400, error: Base.1.0.UnrecognizedRequestBody: The service detected a malformed request body that it was unable to interpret., extended: [{'@odata.type': '#Message.v1_0_5.Message', 'Message': 'The service detected a malformed request body that it was unable to interpret.', 'MessageArgs': [], 'MessageId': 'Base.1.0.UnrecognizedRequestBody', 'RelatedProperties': [], 'Resolution': 'Correct the request body and resubmit the request if it failed.', 'Severity': 'Warning'}] __init__ /usr/lib/python3.9/site-packages/sushy/exceptions.py:122
      ~~~
      
      ACM version - 2.13
      MCE version - 4.8.4
      Managed cluster Version - 4.16.56
      Hardware - NETAS backed up by ZTE
      Model name is NCS6722N4.
      BMC Firmware version is "04.25.02.10"

      Version-Release number of selected component (if applicable):

          

      How reproducible:

      Setup the hub cluster for ACM ZTP installation on NETAS NCS6722N4 servers via BMH and start the installation. The installation media fails to be detached one the provisioning is done and the node is expected to boot from disk.   

      Steps to Reproduce:

          1. Setup the hub cluster for ACM ZTP deployment with the BMH for the systems and start the installation.  
          2. The BMH gets stuck in `Provisioning` state and the node boots from the ISO instead of the disk. The ISO does not get ejected.
          3. Eject the ISO manually to resolve the issue and reboot the node. 
          

      Actual results:

       The installation media does not get detached from the server resulting into the node booting from Installation media again and installation failing. 

      Expected results:

       The installation media should get detached from the server.

      Additional info:

      - Managed cluster name in hub - gbu4cpocp2
      - The following link contains the ACM must gather and the cluster installation logs. 
      
      [+] https://drive.google.com/drive/folders/1agHpOE-NgGe6TKYws3ZCFw9Ub7f9cTD7?usp=drive_link
      
      - Thread raised in forum-ocp-metal-platform where a bug was requested. 
      
      [+] https://redhat-internal.slack.com/archives/CFP6ST0A3/p1772597125510309

       

              jadha Jad Haj Yahya
              rhn-support-adikulka Aditya Kulkarni
              Jad Haj Yahya Jad Haj Yahya
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: