Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-39593

NMI error occurs when booting ThinkSystem SR630 system with RHCOS 4.16 ISO

XMLWordPrintable

    • Important
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

          The BareMetal systems - ThinkSystem SR630 V2 get an NMI error while booting then from 4.16 ISO. The systems get an NMI while the system boots. The is no NMI error observed on 4.15 version and even RHEL 9.4 using the same system, but just RHCOS 4.16 gets an NMI.  This cases the systems to continuously reboot due to the NMI. The systems reboots were fixed, but the system still gets NMI. Based on the dmesg logs initially the gets an NMI due to GHES failure. Disabling GHES using `ghes.disable=1` does not help and the system still gets an NMI.

      Version-Release number of selected component (if applicable):

          4.16

      How reproducible:

          Boot a Lenovo ThinkSystem SR630 V2 with RHCOS 4.16 ISO and check whether the system gets a NMI error.

      Steps to Reproduce:

          1. Boot a  Lenovo ThinkSystem SR630 V2 with RHCOS 4.16 ISO.
          2. Check whether the system gets an NMI error in the web console and in the dmesg logs.
          

      Actual results:

          The systems gets an NMI while the system boots.

      Expected results:

          The system should not get an NMI and boot normally.    

      Additional info:

          - Drive link with NMI error 
      
      Link - https://docs.google.com/document/d/1Jn_HKqULOA15PIvTdzDUpBwHFjSAB9BPzD4HT_M1R5U/edit?usp=sharing
      
      - System details - 
      
      ThinkSystem SR630 V2
       - XCC to latest version to 5.10
       - UEFI to latest version to 3.30
       - LXPM to latest version (which includes support for RHEL 9.4 whereas previous FW only supported 9.1 & made the issue worse) from XWL224B to XWL218E (3.21 to 3.27)
      
      - Drive link with dmesg logs from the system.
      
      Link - https://drive.google.com/drive/folders/1J_0ouZ1MN_IDmfRlQdecwJWvde-MW96n?usp=sharing  
      
      - Slack thread raised for the issue - 
      
      https://redhat-internal.slack.com/archives/C999USB0D/p1725458005269619

       

              Unassigned Unassigned
              rhn-support-adikulka Aditya Kulkarni
              Michael Nguyen Michael Nguyen
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: