Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-16248

SR-IOV VFs are not created until all the nodes in the pools are updated

XMLWordPrintable

    • Important
    • No
    • CNF Network Sprint 239
    • 1
    • False
    • Hide

      None

      Show
      None
    • Customer Escalated
    • 7/13: u/s PR merged as well as d/s in 4.14

      This is a clone of issue OCPBUGS-10323. The following is the description of the original issue:

      Description of problem:

      When a new MachineConfig is applied on an MCP, for the nodes which are drained and rebooted, the VFs are not initialized until the entire pool is updated. The sriov-network-config-daemon waits for the MCP to be ready before draining and creating the VFs.

      Below are the events where an MC was applied on a worker MCP which has 6 nodes:

      New MC getting applied on worker-2:

      I0315 08:52:34.317552       1 node_controller.go:436] Pool worker: Setting node worker-2.ocp4.shiftvirt.com target to rendered-worker-d1a8861c0fa4c9d7be2db1af7125b3ba
      I0315 08:52:34.412815       1 node_controller.go:446] Pool worker: node worker-2.ocp4.shiftvirt.com: changed annotation machineconfiguration.openshift.io/desiredConfig = rendered-worker-d1a8861c0fa4c9d7be2db1af7125b3ba
      I0315 08:52:44.585116       1 drain_controller.go:139] node worker-2.ocp4.shiftvirt.com: initiating cordon (currently schedulable: true)
      I0315 08:52:44.809751       1 drain_controller.go:139] node worker-2.ocp4.shiftvirt.com: cordon succeeded (currently schedulable: false)
      I0315 08:52:44.809795       1 drain_controller.go:139] node worker-2.ocp4.shiftvirt.com: initiating drain

      Completed at 09:00:

      I0315 09:00:50.280547       1 drain_controller.go:139] node worker-2.ocp4.shiftvirt.com: uncordon succeeded (currently schedulable: true)
      I0315 09:00:50.280571       1 drain_controller.go:139] node worker-2.ocp4.shiftvirt.com: operation successful; applying completion annotation
      I0315 09:00:53.699594       1 node_controller.go:446] Pool worker: node worker-2.ocp4.shiftvirt.com: Completed update to rendered-worker-d1a8861c0fa4c9d7be2db1af7125b3ba

       

      Started the sriov-network-config-daemon and the generic-plugin requested node drain:

       

      I0315 09:00:42.486107    3704 utils.go:249] NeedUpdate(): NumVfs needs update desired=5, current=0
      I0315 09:00:42.486120    3704 generic_plugin.go:172] generic-plugin needDrainNode(): need drain, PF 0000:19:00.1 request update
      I0315 09:00:42.486131    3704 generic_plugin.go:125] generic-plugin tryEnableIommuInKernelArgs()
      I0315 09:00:42.510279    3704 daemon.go:478] nodeStateSyncHandler(): plugin generic_plugin: reqDrain true, reqReboot false
       
      

       

      Tried to pause MCP, but the pool is still in updating status, it is updating the other nodes:

      I0315 09:00:46.788534    3704 daemon.go:868] pauseMCP():MCP worker is not ready: [{RenderDegraded False 2023-03-14 13:22:58 +0000 UTC  } {NodeDegraded False 2023-03-15 07:14:42 +0000 UTC  } {Degraded False 2023-03-15 07:14:42 +0000 UTC  } {Updated False 2023-03-15 08:46:34 +0000 UTC  } {Updating True 2023-03-15 08:46:34 +0000 UTC  All nodes are updating to rendered-worker-d1a8861c0fa4c9d7be2db1af7125b3ba}], wait...

       

      This check is at https://github.com/k8snetworkplumbingwg/sriov-network-operator/blob/f53e1d5f75f8fe2336aec82ea47ed054773810dc/pkg/daemon/daemon.go#L826 

      The next node is being updated:

      I0315 09:01:05.504487       1 drain_controller.go:139] node work0.ocp4.shiftvirt.com: cordoning
      I0315 09:01:05.504515       1 drain_controller.go:139] node work0.ocp4.shiftvirt.com: initiating cordon (currently schedulable: true)
      I0315 09:01:05.562782       1 drain_controller.go:139] node work0.ocp4.shiftvirt.com: cordon succeeded (currently schedulable: false)

       

      MCP update was succesfull at 09:08:

      I0315 09:08:00.807071       1 node_controller.go:446] Pool worker: node work0.ocp4.shiftvirt.com: Completed update to rendered-worker-d1a8861c0fa4c9d7be2db1af7125b3ba
      I0315 09:08:00.825989       1 status.go:90] Pool worker: All nodes are updated with rendered-worker-d1a8861c0fa4c9d7be2db1af7125b3ba

       

      It was not able to pause MCP until all nodes are updated and MCP is ready:

      I0315 09:07:46.765462    3704 daemon.go:868] pauseMCP():MCP worker is not ready: [{RenderDegraded False 2023-03-14 13:22:58 +0000 UTC  } {NodeDegraded False 2023-03-15 07:14:42 +0000 UTC  } {Degraded False 2023-03-15 07:14:42 +0000 UTC  } {Updated False 2023-03-15 08:46:34 +0000 UTC  } {Updating True 2023-03-15 08:46:34 +0000 UTC  All nodes are updating to rendered-worker-d1a8861c0fa4c9d7be2db1af7125b3ba}], wait...
      
      I0315 09:08:00.905893    3704 daemon.go:828] pauseMCP(): MCP worker is ready
      I0315 09:08:00.905943    3704 daemon.go:838] pauseMCP(): pause MCP worker
      I0315 09:08:00.935898    3704 daemon.go:690] annotateNode(): Annotate node worker-2.ocp4.shiftvirt.com with: Draining_MCP_Paused
      I0315 09:08:01.052167    3704 daemon.go:828] pauseMCP(): MCP worker is ready
      I0315 09:08:01.052221    3704 daemon.go:830] pauseMCP(): stop MCP informer
      I0315 09:08:01.052365    3704 daemon.go:518] nodeStateSyncHandler(): drain node
      
      

      And finally setting the VFs:

      I0315 09:08:07.750480    3704 utils.go:417] setSriovNumVfs(): set NumVfs for device 0000:19:00.1 to 5

       

      So this in effect requires downtime of all SR-IOV enabled VirtualMachines for an MCP update.

      The customer who reported the issue is using OpenShift Virtualization where they run SR-IOV based VMs.

       

      Version-Release number of selected component (if applicable):

      4.12.1

      How reproducible:

      100%

      Steps to Reproduce:

      1. Apply a new machineconfig on MCP which contains more than one node that has sriovnetwork.
      2. On the rebooted nodes the VFs are not created until all the nodes are updated where the MCP status change to r 
      3.
      

      Actual results:

      SR-IOV VFs are not created until all the nodes in the pools are updated

      Expected results:

      Able to update an MCP without downtime. As of now, the VMs are live migrated during the update to other nodes, but it cannot schedule it back to the updated node since VFs are not created after the update. So the user has to shutdown the VMs for the nodes to complete the MCP update.

      Additional info:

       

            sscheink@redhat.com Sebastian Scheinkman
            openshift-crt-jira-prow OpenShift Prow Bot
            Zhanqi Zhao Zhanqi Zhao
            Andrea Panattoni
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: