Loading...

XML

Word

Printable

Type: Epic
Resolution: Obsolete
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Labels:
- mco_qe_required

Epic Name:
Customizable Observability in the MCO
Epic Status:
To Do
Feature Link:
OCPSTRAT-554 - Improving error handling, propagation, collection, and disambiguation for users
Parent Link:
OCPSTRAT-554Improving error handling, propagation, collection, and disambiguation for users
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done

WSJF:
0

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

It became clear overtime that we need to enhance most of the MCO metrics that we have as well as adding more related to the MCC. The MCC is tasked with watching what's going on with pools and it makes sense to add more metrics and alerting especially there. There are various hiccups with metrics that we've been and are going through. This epic aims at addressing those and start working on adding more useful metrics/alerting to the MCO. Another aim for this epic would be (but we can split it out) to provide more data to help us proactively debug clusters when things go wrong.

There's a preliminary SPIKE attached to this epic (as well as more metrics related cards) that we'd need to hash out and refine first before moving on (the spike will help us to close/move/obsolete some of the attached cards perhaps)

https://docs.google.com/document/d/1xBE_5VvLBGryVOj_Y_YDbOGk63l8hl9acTULMI1-jOU/edit#heading=h.c8x27sak9lvz

~~MCO-1~~ Workflow:

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

Screenshot from 2023-10-09 17-09-22.png
2023/11/14 8:56 PM
88 kB
Michelle Krejci

clones

MCO-1 Observability Infrastructure and Enhanced metrics in MCO

Closed

relates to

MCO-690 MVP: Implement StateController

Closed

MCO-717 Create Pod for State Controller (RBAC)

Closed

MCO-718 Centralize metric registering and listening for all MCO-subcomponent in a unified place

Closed

MCO-818 Setting up RBAC for prometheus in the StateController

Closed

MCO-134 Centralize/standardize metrics registration/handler startup and teardown

Closed

MCO-691 Investigate structure / viability of the HealthController using the pool progression statuses

Closed

MCO-751 Design an eventing system

Closed

MCO-827 Inter-pod Communication (MCO <--> state controller)

Closed

(4 relates to)

Assignee:: Unassigned

Reporter:: Antonio Murdaca

QA Contact:: Rio Liu

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2023/11/14 8:56 PM

Updated:: 2025/03/30 3:53 PM

Resolved:: 2024/09/11 5:15 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates