Loading...

XML

Word

Printable

Type: Sub-task
Resolution: Obsolete
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Labels:
None

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
None
Story Points:
None

Target Version:
None
Release Blocker:
None
Sprint:
Workloads Sprint 208, Workloads Sprint 210, Workloads Sprint 211, Workloads Sprint 212, Workloads Sprint 214, Workloads Sprint 215, Workloads Sprint 216, Workloads Sprint 217, Workloads - 4.12, Workloads Sprint 225, Workloads Sprint 226, Workloads Sprint 227, Workloads Sprint 228, Workloads Sprint 229, Workloads Sprint 230, Workloads Sprint 231, Workloads Sprint 232, Workloads Sprint 233, Workloads Sprint 234, Workloads Sprint 235, Workloads Sprint 236, Workloads Sprint 237, Workloads Sprint 238, Workloads Sprint 239, Workloads Sprint 240, Workloads Sprint 241

kube-controller-manager should be more tolerant of API downtime and not crash, as crashes add up in metrics/events and cause alerts / CI test failures (see https://issues.redhat.com/browse/OCPBUGS-5806 / https://bugzilla.redhat.com/show_bug.cgi?id=2083757)

If the kubernetes API is down while kube-controller-manager is running [1] the initOpCache function, it logs a fatal error and crashes

The https://bugzilla.redhat.com/show_bug.cgi?id=2082628 bz was created separately to deal with the kubelet issues we observe following the crash, while this bz is about the crash itself.

Not sure, see https://bugzilla.redhat.com/show_bug.cgi?id=2082628 for the context in which we notice this crash. It's not entirely clear whether the rarity is due to kubelet's behavior or whether this crash itself is rare.

[1] https://github.com/openshift/kubernetes/blob/fe7796f337ea0d35bc3e6b5428d63685d1833cb5/pkg/controller/namespace/deletion/namespaced_resources_deleter.go#L159-L165

it looks like this controller should behave more like GC does, in that it should try to scrape the data, if it can't just retry in a while.
Definitely something that should be pursuit upstream. I'll bump priority on it to get it ideally in 1.25

Assignee:: Unassigned

Reporter:: Filip Krepinsky

Need Info From:: None

Contributors:: None

QA Contact:: None

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2023/04/20 4:12 PM

Updated:: 2024/05/13 8:46 PM

Resolved:: 2024/05/13 8:46 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates