Details
-
Bug
-
Resolution: Won't Do
-
Undefined
-
None
-
4.15.0
-
No
-
CCXDEV Sprint 112
-
1
-
False
-
Description
Description of problem:
insights operator reports DEGRADED after a couple of hours of cluster creation or after pod restart
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. create private 4.15.0 rosa cluster with multi-az 2. 3.
Actual results:
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE True 8h Reporting was not allowed: your Red Hat account is not enabled for remote support or your token has expired: {"errors":[{"status":401,"detail":"UHC services authentication failed","meta":{"response_by":"gateway"}}]}...
pod logs:
I0315 10:25:17.987497 1 httplog.go:132] "HTTP" verb="GET" URI="/metrics" latency="3.515878ms" userAgent="Prometheus/2.48.0" audit-ID="64357a43-1f66-4b2d-a602-0fe0cf73aeff" srcIP="10.128.3.22:43000" resp=200 I0315 10:25:18.700669 1 controller.go:220] Number of last upload failures 9 exceeded the threshold 5. Marking as degraded. I0315 10:25:18.700725 1 controller.go:428] The operator has some internal errors: Reporting was not allowed: your Red Hat account is not enabled for remote support or your token has expired: {"errors":[{"status":401,"detail":"UHC services authentication failed","meta":{"response_by":"gateway"}}]}
it seems the pod is failing to upload immediately but it doesn't report DEGRADED operator until some time because of failure threshold.
I0315 10:37:04.518142 1 controller.go:216] Number of last upload failures 1 lower than threshold 5. Not marking as degraded. I0315 10:37:04.518182 1 controller.go:444] The operator is healthy
Expected results:
insight-operator stays AVAILABLE
Additional info:
ocm staging cluster-id: 2a0d3io5jag9pprqrrf1i0j8nn8e37ke org_id: "1HAXGgCYqHpednsRDiwWsZBmDlA"