Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Won't Do
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.15.0
Component/s: Insights Operator
Labels:
- obsint-io

Regression:
No
Sprint:
CCXDEV Sprint 112
sprint_count:
1
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Links:

Description

Description of problem:

    insights operator reports DEGRADED after a couple of hours of cluster creation or after pod restart

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. create private 4.15.0 rosa cluster with multi-az
    2.
    3.

Actual results:

    NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
    True       8h      Reporting was not allowed: your Red Hat account is not enabled for remote support or your token has expired: {"errors":[{"status":401,"detail":"UHC services authentication failed","meta":{"response_by":"gateway"}}]}...

pod logs:

I0315 10:25:17.987497 1 httplog.go:132] "HTTP" verb="GET" URI="/metrics" latency="3.515878ms" userAgent="Prometheus/2.48.0" audit-ID="64357a43-1f66-4b2d-a602-0fe0cf73aeff" srcIP="10.128.3.22:43000" resp=200 I0315 10:25:18.700669 1 controller.go:220] Number of last upload failures 9 exceeded the threshold 5. Marking as degraded. I0315 10:25:18.700725 1 controller.go:428] The operator has some internal errors: Reporting was not allowed: your Red Hat account is not enabled for remote support or your token has expired: {"errors":[{"status":401,"detail":"UHC services authentication failed","meta":{"response_by":"gateway"}}]}

it seems the pod is failing to upload immediately but it doesn't report DEGRADED operator until some time because of failure threshold.

I0315 10:37:04.518142 1 controller.go:216] Number of last upload failures 1 lower than threshold 5. Not marking as degraded. I0315 10:37:04.518182 1 controller.go:444] The operator is healthy

Expected results:

    insight-operator stays AVAILABLE

Additional info:

    ocm staging cluster-id: 2a0d3io5jag9pprqrrf1i0j8nn8e37ke   
    org_id: "1HAXGgCYqHpednsRDiwWsZBmDlA"

Attachments

Activity

People

Assignee:: Tomas Remes

Reporter:: Mulham Raee

QA Contact:: Joao Bastos Fula

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2024/03/15 2:49 PM

Updated:: 2024/03/19 2:59 PM

Resolved:: 2024/03/19 2:59 PM