Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 4.13, 4.12, 4.14
Component/s: kube-controller-manager
Labels:
None

Severity:
Important
Regression:
No
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:

Description of problem:

Having a validating webhook timeout exceeding the hard coded kube-controller-manager timeout of 5 seconds, has the kube-controller-manager pods in a crashloopbackup state in a continuous leaderelection loop - stalling the cluster completely as describe in
kube-controller-manager timeout is exceeded by validating webhook during CNI restart leading to degraded cluster state

Version-Release number of selected component (if applicable):

How reproducible:

Add a validating webhook with a timeout longer than 5 seconds AND have it failed

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Cluster is stalled - kube-controller-manager pods are in crashloopback continuously failing leaderelection
- Pods are deleted but are not being re-created automatically by the operator or daemonset.
- openshift-apiserver pods are crash-looping, but openshift-kube-apiserver pods are in RUNNING/available state.
- The API appears to be stalling out on requests to create new resources but deleting resources can be completed successfully immediately.
- ETCD appears healthy and is not in READ-ONLY state.
- Master nodes are in READY and API/API-INT is reachable from both bastion and master nodes consistently (API not flapping).

Expected results:

Cluster shouldn't fail

Additional info:

kube-controller-manager pods logs are showing the following message repeatedly:
~~~
2023-12-12T14:29:59.457408575Z E1212 14:29:59.457354 1 leaderelection.go:367] Failed to update lock: Put "https://api-int.example.com:6443/api/v1/namespaces/kube-system/configmaps/kube-controller-manager?timeout=5s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
~~~

Assignee:: Filip Krepinsky

Reporter:: Ilan Green

QA Contact:: ying zhou

Votes:: 4 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2023/12/26 9:51 AM

Updated:: 2024/09/03 12:51 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates