Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-30926

SriovNetworkNodeState error "-E- Failed to open the device"

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Undefined
    • None
    • 4.16
    • Networking / SR-IOV
    • None
    • No
    • CNF Network Sprint 251
    • 1
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      After istalliing the SR-IOV Network Operator, if a node has a Mellanox SR-IOV NIC, the SriovNetworkNodeState goes to Failed status. sriov-netowkr-config-daemon logs
      contains the error:
      
      2024-03-13T12:41:07.158337613Z	INFO	daemon/daemon.go:546	mellanox plugin OnNodeStateChange()
      
      2024-03-13T12:41:07.158367088Z	INFO	kernel/kernel.go:582	RunCommand()	{"command": "/bin/sh", "args": ["-c", "cat", "/host/sys/kernel/security/lockdown"]}
      2024-03-13T12:41:07.164472541Z	LEVEL(-2)	kernel/kernel.go:582	RunCommand()	{"output": "", "error": null}
      2024-03-13T12:41:07.164512706Z	LEVEL(-2)	mellanox/mellanox_plugin.go:82	IsKernelLockdownMode()	{"output": "", "error": null}
      2024-03-13T12:41:07.164585388Z	INFO	mellanox/mellanox_plugin.go:156	mellanox-plugin getMlnxNicFwData()	{"device": "0000:d8:00.0"}
      2024-03-13T12:41:07.164611158Z	INFO	mellanox/mellanox.go:180	MstConfigReadData()	{"device": "0000:d8:00.0"}
      2024-03-13T12:41:07.164626797Z	INFO	mellanox/mellanox.go:78	RunCommand()	{"command": "mstconfig", "args": ["-e", "-d", "0000:d8:00.0", "q"]}
      2024-03-13T12:41:07.177660098Z	LEVEL(-2)	mellanox/mellanox.go:78	RunCommand()	{"output": "-E- Failed to open the device\n", "error": "exit status 3"}
      
          

      How reproducible:

      100%
          

      Steps to Reproduce:

          1. Provision a 4.16 cluster with at least one node with Mellanox NICs and SecureBoot enabled 
          2. Deploy the SR-IOV network operator
          3. run `oc get -n openshift-sriov-network-operator sriovnetworknodestate`
          

      Actual results:

      worker has SYNC STATUS == Failed
          

      Expected results:

      all workers have SYNC STATUS == Succeeded
          

      Additional info:

      Mellanox NICs need SecureBoot disabled to work correctly, but the operator should raise an error if and only if the user tries to configure the NIC.
      https://access.redhat.com/solutions/6643261
          

      Attachments

        Issue Links

          Activity

            People

              apanatto@redhat.com Andrea Panattoni
              apanatto@redhat.com Andrea Panattoni
              Zhanqi Zhao Zhanqi Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: