Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: 4.9
Component/s: kube-controller-manager
Labels:
None

Regression:
None
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083757

Description of problem:
If the kubernetes API is down while kube-controller-manager is running [1] the initOpCache function, it logs a fatal error and crashes

The https://bugzilla.redhat.com/show_bug.cgi?id=2082628 bz was created separately to deal with the kubelet issues we observe following the crash, while this bz is about the crash itself.

Version-Release number of selected component (if applicable):
At-least as early as 4.9

How reproducible:
Not sure, see https://bugzilla.redhat.com/show_bug.cgi?id=2082628 for the context in which we notice this crash. It's not entirely clear whether the rarity is due to kubelet's behavior or whether this crash itself is rare. 

Steps to Reproduce:
See https://bugzilla.redhat.com/show_bug.cgi?id=2082628

Actual results:
kube-controller-manager crashes

Expected results:
kube-controller-manager should be more tolerant of API downtime and not crash, as crashes add up in metrics/events and cause alerts / CI test failures

Additional info:
None

[1] https://github.com/openshift/kubernetes/blob/fe7796f337ea0d35bc3e6b5428d63685d1833cb5/pkg/controller/namespace/deletion/namespaced_resources_deleter.go#L159-L165


it looks like this controller should behave more like GC does, in that it should try to scrape the data, if it can't just retry in a while. 
Definitely something that should be pursuit upstream. I'll bump priority on it to get it ideally in 1.25

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

is related to

WRKLDS-358 Expose additional information about GC and Quota under /debug endpoint

To Do

Assignee:: Filip Krepinsky

Reporter:: Filip Krepinsky

QA Contact:: ying zhou

Need Info From:: Filip Krepinsky

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2023/01/12 10:35 PM

Updated:: 2023/04/20 4:24 PM

Resolved:: 2023/04/20 4:24 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates