Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: 4.13.0
Affects Version/s: 4.12
Component/s: kube-controller-manager
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
No

Target Backport Versions:
None
Target Version:

4.13.0
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
Release Note Not Required
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

This is a clone of issue ~~OCPBUGS-7440~~. The following is the description of the original issue:
—
Description of problem:

while trying to figure out why it takes so long to install Single node OpenShift I noticed that the kube-controller-manager cluster operator is degraded for ~5 minutes due to:
GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp 172.30.119.108:9091: connect: connection refused
I don't understand how the prometheusClient is successfully initialized, but we get a connection refused once we try to query the rules.
Note that if the client initialization fails the kube-controller-manger won't set the  GarbageCollectorDegraded to true.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

100%

Steps to Reproduce:

1. install SNO with bootstrap in place (https://github.com/eranco74/bootstrap-in-place-poc)

2. monitor the cluster operators staus

Actual results:

GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp 172.30.119.108:9091: connect: connection refused

Expected results:

Expected the GarbageCollectorDegraded status to be false

Additional info:

It seems that for PrometheusClient to be successfully initialised it needs to successfully create a connection but we get connection refused once we make the query.
Note that installing SNO with this patch (https://github.com/eranco74/cluster-kube-controller-manager-operator/commit/26e644503a8f04aa6d116ace6b9eb7b9b9f2f23f) reduces the installation time by 3 minutes

clones

OCPBUGS-7440 kube-controller-manager cluster operator is degraded due connection refused while querying rules

Closed

is blocked by

OCPBUGS-7440 kube-controller-manager cluster operator is degraded due connection refused while querying rules

Closed

links to

openshift/cluster-kube-controller-manager-operator#717: [release-4.13] OCPBUGS-11222: do not degrade KCM when when monitoring stack rollout is in progress

Assignee:: Filip Krepinsky

Reporter:: OpenShift Prow Bot

QA Contact:: Ying Zhou

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2023/03/31 3:07 PM

Updated:: 2025/07/27 11:32 AM

Resolved:: 2023/05/17 10:44 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates