Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-47549

Non-valid JSON output from `lshw` on the Phantom Lake Smart-NIC causing Bare Metal Operator failures

    • None
    • 2
    • Metal Platform 264, Metal Platform 268
    • 2
    • False
    • Hide

      None

      Show
      None

      Description of problem:

          When the bare metal operator live boots a coreos image for inspecting the Phantom Lake card via RAM Disk. It will go into a fail loop.
      
      The reason it goes into a fail loop is because ironic-agent gets into a JSONDecodeError from lshw.
      
      Specifically the RAM memory portion has a `\` character in the `serial` field that is unescaped thus causing the parsing to fail.
      ---snip---
                "children" : [            {
                    "id" : "bank:0",
                    "class" : "memory",
                    "claimed" : true,
                    "handle" : "DMI:0015",
                    "description" : "DIMM DDR4 Synchronous 3200 MHz (0.3 ns)",
                    "physid" : "0",
                    "serial" : "\",
                    "slot" : "CPU0_DIMM_A1",
                    "units" : "bytes",
                    "size" : 17179869184,
                    "width" : 64,
                    "clock" : 3200000000
                  },
      ---snip---

      Version-Release number of selected component (if applicable):

      OpenShift 4.17.9    

      How reproducible:

      Always

      Steps to Reproduce:

          1. Install these YAML files
      ---
      # Documentation: https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.10/html-single/clusters/index#enable-cim-create-provision
      apiVersion: metal3.io/v1alpha1
      kind: Provisioning
      metadata:
        name: provisioning-configuration
        namespace: open-cluster-management
      spec:
        provisioningMacAddresses:
          # IMPORTANT: The provisioningMacAddresses is on purpose: do not change this unless you absolutely know what you're doing (hint: read docs first)!
        - aa:aa:aa:aa:aa:aa
        provisioningNetwork: Disabled
        watchAllNamespaces: true
      ---
      apiVersion: metal3.io/v1alpha1
      kind: BareMetalHost
      metadata:
        name: metal3-ptl-bmh-2
        namespace: openshift-machine-api
        annotations:
              inspect.metal3.io: disabled
      spec:
        bmc:
          address:  redfish-virtualmedia://wsfd-advnetlab230-intel-ptl-bmc.anl.eng.bos2.dc.redhat.com/redfish/v1/Systems/system
          credentialsName: metal3-ptl-credentials
          disableCertificateVerification: true
        bootMACAddress: 88:dc:97:44:ed:8c
        description: "Phantom Lake"
        online: true
        automatedCleaningMode: metadata
        image:
          url: https://file.brq.redhat.com/wizhao/RHEL-9.4.0-20240527.75-x86_64-dvd1-w-kickstart-2.iso
          format: live-iso
        rootDeviceHints:
          deviceName: "/dev/mmcblk0"
          2. When the boot ISO is run from ramdisk, inspect the ironic-python-agent pod
          3.Dec 20 20:44:08 wsfd-advnetlab230-intel-ptl.anl.eng.bos2.dc.redhat.com ironic-agent[1978]: 2024-12-20 20:44:08.066 1 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): lshw -quiet -json execute /usr/lib/python3.9/site-packages/oslo_concurrency/processutils.py:390Dec 20 20:44:08 wsfd-advnetlab230-intel-ptl.anl.eng.bos2.dc.redhat.com podman[1952]: 2024-12-20 20:44:08.066 1 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): lshw -quiet -json execute /usr/lib/python3.9/site-packages/oslo_concurrency/processutils.py:390Dec 20 20:44:09 wsfd-advnetlab230-intel-ptl.anl.eng.bos2.dc.redhat.com ironic-agent[1978]: 2024-12-20 20:44:09.553 1 DEBUG oslo_concurrency.processutils [-] CMD "lshw -quiet -json" returned: 0 in 1.487s execute /usr/lib/python3.9/site-packages/oslo_concurrency/processutils.py:428Dec 20 20:44:09 wsfd-advnetlab230-intel-ptl.anl.eng.bos2.dc.redhat.com ironic-agent[1978]: 2024-12-20 20:44:09.553 1 DEBUG ironic_lib.utils [-] Command stderr is: "" _log /usr/lib/python3.9/site-packages/ironic_lib/utils.py:107Dec 20 20:44:09 wsfd-advnetlab230-intel-ptl.anl.eng.bos2.dc.redhat.com podman[1952]: 2024-12-20 20:44:09.553 1 DEBUG oslo_concurrency.processutils [-] CMD "lshw -quiet -json" returned: 0 in 1.487s execute /usr/lib/python3.9/site-packages/oslo_concurrency/processutils.py:428Dec 20 20:44:09 wsfd-advnetlab230-intel-ptl.anl.eng.bos2.dc.redhat.com podman[1952]: 2024-12-20 20:44:09.553 1 DEBUG ironic_lib.utils [-] Command stderr is: "" _log /usr/lib/python3.9/site-packages/ironic_lib/utils.py:107Dec 20 20:44:09 wsfd-advnetlab230-intel-ptl.anl.eng.bos2.dc.redhat.com ironic-agent[1978]: 2024-12-20 20:44:09.554 1 ERROR root [-] Unexpected error dispatching list_network_interfaces to manager <ironic_python_agent.hardware.GenericHardwareManager object at 0x7f9b75049f70>: Invalid control character at: line 331 column 30 (char 11066): json.decoder.JSONDecodeError: Invalid control character at: line 331 column 30 (char 11066)Dec 20 20:44:09 wsfd-advnetlab230-intel-ptl.anl.eng.bos2.dc.redhat.com ironic-agent[1978]: 2024-12-20 20:44:09.554 1 ERROR root Traceback (most recent call last):Dec 20 20:44:09 wsfd-advnetlab230-intel-ptl.anl.eng.bos2.dc.redhat.com ironic-agent[1978]: 2024-12-20 20:44:09.554 1 ERROR root   File "/usr/lib/python3.9/site-packages/ironic_python_agent/hardware.py", line 3492, in dispatch_to_managersDec 20 20:44:09 wsfd-advnetlab230-intel-ptl.anl.eng.bos2.dc.redhat.com podman[1952]: 2024-12-20 20:44:09.554 1 ERROR root [-] Unexpected error dispatching list_network_interfaces to manager <ironic_python_agent.hardware.GenericHardwareManager object at 0x7f9b75049f70>: Invalid control character at: line 331 column 30 (char 11066): json.decoder.JSONDecodeError: Invalid control character at: line 331 column 30 (char 11066)Dec 20 20:44:09 wsfd-advnetlab230-intel-ptl.anl.eng.bos2.dc.redhat.com podman[1952]: 2024-12-20 20:44:09.554 1 ERROR root Traceback (most recent call last):     

      Actual results:

      Bare Metal Operator does not proceed further beyond trying to inspect the h an enters a failure loop.

      Expected results:

      Bare Metal Operator should be able to inspect the HW with lshw properly.   

      Additional info:

          

              rhn-engineering-dtantsur Dmitry Tantsur
              wizhao@redhat.com William Zhao
              Jad Haj Yahya Jad Haj Yahya
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: