Uploaded image for project: 'Machine Config Operator'
  1. Machine Config Operator
  2. MCO-846

Customizable Observability in the MCO

    XMLWordPrintable

Details

    • Epic
    • Resolution: Unresolved
    • Normal
    • None
    • None
    • Customizable Observability in the MCO
    • To Do
    • OCPSTRAT-554 - Improving error handling, propagation, collection, and disambiguation for users
    • OCPSTRAT-554Improving error handling, propagation, collection, and disambiguation for users
    • 13
    • 13% 13%
    • 0
    • 0

    Description

      It became clear overtime that we need to enhance most of the MCO metrics that we have as well as adding more related to the MCC. The MCC is tasked with watching what's going on with pools and it makes sense to add more metrics and alerting especially there. There are various hiccups with metrics that we've been and are going through. This epic aims at addressing those and start working on adding more useful metrics/alerting to the MCO. Another aim for this epic would be (but we can split it out) to provide more data to help us proactively debug clusters when things go wrong.

       

      There's a preliminary SPIKE attached to this epic (as well as more metrics related cards) that we'd need to hash out and refine first before moving on (the spike will help us to close/move/obsolete some of the attached cards perhaps)

       

      https://docs.google.com/document/d/1xBE_5VvLBGryVOj_Y_YDbOGk63l8hl9acTULMI1-jOU/edit#heading=h.c8x27sak9lvz

       

      MCO-1 Workflow:

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              amurdaca@redhat.com Antonio Murdaca
              Rio Liu Rio Liu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: