-
Bug
-
Resolution: Done-Errata
-
Normal
-
None
-
4.14
-
None
-
Moderate
-
No
-
MON Sprint 256, MON Sprint 257
-
2
-
False
-
-
NA
-
Release Note Not Required
-
In Progress
Description of problem:
While debugging https://docs.google.com/document/d/10kcIQPsn2H_mz7dJx3lbZR2HivjnC_FAnlt2adc53TY/edit#heading=h.egy1agkrq2v1, we came across the log: 2023-07-31T16:51:50.240749863Z W0731 16:51:50.240586 1 tasks.go:72] task 3 of 15: Updating Prometheus-k8s failed: [unavailable (unknown): client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline, degraded (unknown): client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline] After some searching, we understood that the log is trying to say that ValidatePrometheus timed out waiting for prometheus to become ready. The
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
See here https://redhat-internal.slack.com/archives/C02BQCCFZPX/p1690892059971129?thread_ts=1690873617.023399&cid=C02BQCCFZPX for how to get the function time out.
Actual results:
Expected results:
- Clearer logs. - Some info that we are logging makes more sense to be part of the error, example: https://github.com/openshift/cluster-monitoring-operator/blob/af831de434ce13b3edc0260a468064e0f3200044/pkg/client/client.go#L890 - Make info as "unavailable (unknown):" clearer as we cannot understand want it means without referring to code.
Additional info:
- Do the same for the other functions that wait for other components if using the same wait mechanism (PollUntilContextTimeout...) - https://redhat-internal.slack.com/archives/C02BQCCFZPX/p1690873617023399 for more details. see https://redhat-internal.slack.com/archives/C0VMT03S5/p1691069196066359?thread_ts=1690827144.818209&cid=C0VMT03S5 for the slack discussion.
- is duplicated by
-
OCPBUGS-37595 Cluster operator's conditions are unclear when a pod fails to become ready
- Closed
- is related to
-
OCPBUGS-35892 monitoring ClusterOperator should not blip Available=Unknown on client rate limiter
- Closed
- links to
-
RHEA-2024:3718 OpenShift Container Platform 4.17.z bug fix update