Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-23795

SNO: SRIOV Operator long lease acquisition after a controlled shutdown/restart

XMLWordPrintable

    • Important
    • No
    • CNF Network Sprint 256, CNF Network Sprint 257, CNF Network Sprint 258, CNF Network Sprint 259
    • 4
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, the Single-Root I/O Virtualization (SR-IOV) Operator did not expire the acquired lease during the Operator's shutdown operation. This impacted a new instance of the Operator, because the new instance had to wait for the lease to expire before the new instance was operational. With this release, an update to the Operator shutdown logic ensures that the Operator expires the lease when the Operator is shutting down. (link:https://issues.redhat.com/browse/OCPBUGS-23795[*OCPBUGS-23795*])
      Show
      * Previously, the Single-Root I/O Virtualization (SR-IOV) Operator did not expire the acquired lease during the Operator's shutdown operation. This impacted a new instance of the Operator, because the new instance had to wait for the lease to expire before the new instance was operational. With this release, an update to the Operator shutdown logic ensures that the Operator expires the lease when the Operator is shutting down. (link: https://issues.redhat.com/browse/OCPBUGS-23795 [* OCPBUGS-23795 *])
    • Bug Fix
    • Done
    • Hide
      5/9 - Fix merged in 4.16. Color: Green
      24-03-05: Pinged upstream community to review the fixing PR. A long term operator refactoring has been proposed in a separated thread
      Show
      5/9 - Fix merged in 4.16. Color: Green 24-03-05: Pinged upstream community to review the fixing PR. A long term operator refactoring has been proposed in a separated thread

      Description of problem:

      After a controlled restart of the sriov operator (ex. rollout during an upgrade) it is taking 5+ mins to acquire the lease on startup.  
      
      I1123 13:20:44.792821       1 leaderelection.go:245] attempting to acquire leader lease openshift-sriov-network-operator/a56def2a.openshift.io...
      I1123 13:25:56.033843       1 leaderelection.go:255] successfully acquired lease openshift-sriov-network-operator/a56def2a.openshift.io
      
      
      
      
      This LeaderElectionReleaseOnCancel: true is missing from 
       https://github.com/openshift/sriov-network-operator/blob/master/main.go#L104 
      
      The impact is slow recovery from any controlled restart of the sriov operator.

      Version-Release number of selected component (if applicable):

      4.14.0-202311060529

      How reproducible:

          100%

      Steps to Reproduce:

          1. Restart sriov operator
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

          

              apanatto@redhat.com Andrea Panattoni
              browsell@redhat.com Brent Rowsell
              Dwaine Gonyier Dwaine Gonyier
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated: