Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-71903

[CLOSED LOOP for] new readinessProbe and livenessProbe in KMP are too low and cause crashloop

XMLWordPrintable

    • Icon: Closed Loop Closed Loop
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • None
    • CNV Network
    • None
    • Future Sustainability
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None

      Description of problem:

      in new 4.99 builds, KMP is now deployed with readinessProbe and livenessProbe in the manager container:
      
            readinessProbe:
              httpGet:
                path: /readyz
                port: webhook-server
                scheme: HTTPS
                httpHeaders:
                  - name: Content-Type
                    value: application/json
              initialDelaySeconds: 10
              timeoutSeconds: 1
              periodSeconds: 10
              successThreshold: 1
              failureThreshold: 3
            name: manager
            command:
              - /manager
            livenessProbe:
              httpGet:
                path: /healthz
                port: webhook-server
                scheme: HTTPS
                httpHeaders:
                  - name: Content-Type
                    value: application/json
              initialDelaySeconds: 15
              timeoutSeconds: 1
              periodSeconds: 20
              successThreshold: 1
              failureThreshold: 3
      
      It turns out, that on "real" clusters, these timers are too short and KMP doesn't have time to reach the ready state, and as a result the liveness probe causes the KMP pod to restart prematurely, casing a CrashLoopBackOff

      Version-Release number of selected component (if applicable):
      v4.99.0.rhel9-2317

      How reproducible:

      100%, on clusters with real workloads (not CI/test clusters)

      Steps to Reproduce:

      1. Install or upgrade CNV to the specified version
      2. check the status of the KMP pod
      3.
      

      Actual results:

      KMP pod is being crashlooped

      Expected results:

      KMP pod should be ready

      Additional info:

      caused by:
      https://github.com/k8snetworkplumbingwg/kubemacpool/pull/540
      
      This issue causes KMP not to function and blocks CNV upgrade.

       

              jhopper@redhat.com Jenifer Abrams
              dagur@redhat.com Daniel Gur
              Mordechai Lehrer Mordechai Lehrer
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: