Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Critical
Fix Version/s: CNV v4.20.z
Affects Version/s: CNV v4.20.0
Component/s: CNV Network
Labels:
- cnv-netsdn-wg
- triaged

Activity Type:
Quality / Stability / Reliability
Story Points:
0.42
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Component Fix Version(s):
CNV v4.20.3.rhel9-6
Git Pull Request:
https://github.com/k8snetworkplumbingwg/kubemacpool/pull/571
Market:

Severity:
Important
Customer Impact:

Customer Reported

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:

Description of problem:

in new 4.99 builds, KMP is now deployed with readinessProbe and livenessProbe in the manager container:

      readinessProbe:
        httpGet:
          path: /readyz
          port: webhook-server
          scheme: HTTPS
          httpHeaders:
            - name: Content-Type
              value: application/json
        initialDelaySeconds: 10
        timeoutSeconds: 1
        periodSeconds: 10
        successThreshold: 1
        failureThreshold: 3
      name: manager
      command:
        - /manager
      livenessProbe:
        httpGet:
          path: /healthz
          port: webhook-server
          scheme: HTTPS
          httpHeaders:
            - name: Content-Type
              value: application/json
        initialDelaySeconds: 15
        timeoutSeconds: 1
        periodSeconds: 20
        successThreshold: 1
        failureThreshold: 3

It turns out, that on "real" clusters, these timers are too short and KMP doesn't have time to reach the ready state, and as a result the liveness probe causes the KMP pod to restart prematurely, casing a CrashLoopBackOff

Version-Release number of selected component (if applicable):
v4.99.0.rhel9-2317

How reproducible:

100%, on clusters with real workloads (not CI/test clusters)

Steps to Reproduce:

1. Install or upgrade CNV to the specified version
2. check the status of the KMP pod
3.

Actual results:

KMP pod is being crashlooped

Expected results:

KMP pod should be ready

Additional info:

caused by:
https://github.com/k8snetworkplumbingwg/kubemacpool/pull/540

This issue causes KMP not to function and blocks CNV upgrade.

is cloned by

CNV-71923 [cnv-4.20] KMP take 6m+ to be ready on cluster with many namespaces (200+)

ON_QA

is related to

CNV-71903 [CLOSED LOOP for] new readinessProbe and livenessProbe in KMP are too low and cause crashloop

Refinement

Assignee:: Ram Lavi

Reporter:: Oren Cohen

QA Contact:: Yoss Segev

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2025/06/03 12:01 PM

Updated:: 2025/11/27 7:21 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide