-
Bug
-
Resolution: Done
-
Undefined
-
4.12
-
None
-
No
-
False
-
-
Release Note Not Required
This is a clone of issue OCPBUGS-7440. The following is the description of the original issue:
—
Description of problem:
while trying to figure out why it takes so long to install Single node OpenShift I noticed that the kube-controller-manager cluster operator is degraded for ~5 minutes due to: GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp 172.30.119.108:9091: connect: connection refused I don't understand how the prometheusClient is successfully initialized, but we get a connection refused once we try to query the rules. Note that if the client initialization fails the kube-controller-manger won't set the GarbageCollectorDegraded to true.
Version-Release number of selected component (if applicable):
4.12
How reproducible:
100%
Steps to Reproduce:
1. install SNO with bootstrap in place (https://github.com/eranco74/bootstrap-in-place-poc) 2. monitor the cluster operators staus
Actual results:
GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp 172.30.119.108:9091: connect: connection refused
Expected results:
Expected the GarbageCollectorDegraded status to be false
Additional info:
It seems that for PrometheusClient to be successfully initialised it needs to successfully create a connection but we get connection refused once we make the query. Note that installing SNO with this patch (https://github.com/eranco74/cluster-kube-controller-manager-operator/commit/26e644503a8f04aa6d116ace6b9eb7b9b9f2f23f) reduces the installation time by 3 minutes
- clones
-
OCPBUGS-7440 kube-controller-manager cluster operator is degraded due connection refused while querying rules
- Closed
- is blocked by
-
OCPBUGS-7440 kube-controller-manager cluster operator is degraded due connection refused while querying rules
- Closed
- links to