Loading...

XML

Word

Printable

Type: Task
Resolution: Done
Priority: Major
Fix Version/s: ConsoleDot CY23Q1
Affects Version/s: None
Component/s: None
Labels:
- platform-accessmanagement
- platform-infrastructure

Epic Link:
RHCLOUD-23489
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
BZ requires_doc_text:
Unset
Regression:
None
BZ Keywords:
- Unset

Sprint:
Platform A&M Sprint 53, Platform A&M Sprint 54, Platform A&M Sprint 55, Platform A&M Sprint 56, Platform A&M Sprint 57, Platform A&M Sprint 58, Platform A&M Sprint 59, Platform A&M Sprint 60, Platform A&M Sprint 61

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Openshift clusters are occasionally reporting "degraded" status due to error encountered during CCX data uploads.
See https://issues.redhat.com/browse/SDB-3091 and https://issues.redhat.com/browse/CCXDEV-9209.

The errors seem to be related to authentication issues reported by UHC-proxy:

https://github.com/RedHatInsights/uhc-auth-proxy/blob/master/server/server.go#L141

We need enhance logging and monitoring so we can better determine the root cause of the errors:

(1) Add cloudwatch logging so we have better log retention. Logging should include detail error codes/messages.
(2) Add Prometheus metrics so we can get a better view of failure patterns.
(3) Add alerts based on (2)

clones

RHCLOUD-21184 Enhance Monitoring for 5xx API errors

Closed

Assignee:: Daniel Agbay

Reporter:: Lindani Phiri

Involved:: Drew Bomhof, Eric Himmelreich, Tomas Remes

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2022/10/04 3:50 PM

Updated:: 2023/03/15 8:04 PM

Resolved:: 2023/03/06 3:43 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates