-
Bug
-
Resolution: Unresolved
-
Critical
-
CNV v4.20.0
-
Quality / Stability / Reliability
-
0.42
-
False
-
-
False
-
CNV v4.20.3.rhel9-6
-
-
Important
-
Customer Reported
-
None
Description of problem:
in new 4.99 builds, KMP is now deployed with readinessProbe and livenessProbe in the manager container: readinessProbe: httpGet: path: /readyz port: webhook-server scheme: HTTPS httpHeaders: - name: Content-Type value: application/json initialDelaySeconds: 10 timeoutSeconds: 1 periodSeconds: 10 successThreshold: 1 failureThreshold: 3 name: manager command: - /manager livenessProbe: httpGet: path: /healthz port: webhook-server scheme: HTTPS httpHeaders: - name: Content-Type value: application/json initialDelaySeconds: 15 timeoutSeconds: 1 periodSeconds: 20 successThreshold: 1 failureThreshold: 3 It turns out, that on "real" clusters, these timers are too short and KMP doesn't have time to reach the ready state, and as a result the liveness probe causes the KMP pod to restart prematurely, casing a CrashLoopBackOff
Version-Release number of selected component (if applicable):
v4.99.0.rhel9-2317
How reproducible:
100%, on clusters with real workloads (not CI/test clusters)
Steps to Reproduce:
1. Install or upgrade CNV to the specified version 2. check the status of the KMP pod 3.
Actual results:
KMP pod is being crashlooped
Expected results:
KMP pod should be ready
Additional info:
caused by: https://github.com/k8snetworkplumbingwg/kubemacpool/pull/540 This issue causes KMP not to function and blocks CNV upgrade.