-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
4.9
-
None
-
None
-
Rejected
-
False
-
Description of problem:
bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083757 Description of problem: If the kubernetes API is down while kube-controller-manager is running [1] the initOpCache function, it logs a fatal error and crashes The https://bugzilla.redhat.com/show_bug.cgi?id=2082628 bz was created separately to deal with the kubelet issues we observe following the crash, while this bz is about the crash itself. Version-Release number of selected component (if applicable): At-least as early as 4.9 How reproducible: Not sure, see https://bugzilla.redhat.com/show_bug.cgi?id=2082628 for the context in which we notice this crash. It's not entirely clear whether the rarity is due to kubelet's behavior or whether this crash itself is rare. Steps to Reproduce: See https://bugzilla.redhat.com/show_bug.cgi?id=2082628 Actual results: kube-controller-manager crashes Expected results: kube-controller-manager should be more tolerant of API downtime and not crash, as crashes add up in metrics/events and cause alerts / CI test failures Additional info: None [1] https://github.com/openshift/kubernetes/blob/fe7796f337ea0d35bc3e6b5428d63685d1833cb5/pkg/controller/namespace/deletion/namespaced_resources_deleter.go#L159-L165 it looks like this controller should behave more like GC does, in that it should try to scrape the data, if it can't just retry in a while. Definitely something that should be pursuit upstream. I'll bump priority on it to get it ideally in 1.25
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
- is related to
-
WRKLDS-358 Expose additional information about GC and Quota under /debug endpoint
- To Do