Uploaded image for project: 'Machine Config Operator'
  1. Machine Config Operator
  2. MCO-134

Centralize/standardize metrics registration/handler startup and teardown

XMLWordPrintable

    • Icon: Spike Spike
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • None
    • 3
    • False
    • False
    • OCPSTRAT-554 - Improving error handling, propagation, collection, and disambiguation for users
    • MCO Sprint 240, MCO Sprint 243
    • 0
    • 0.000

      Metrics are convoluted in the MCO. We use some of the controller's code to startup the other metrics listeners. Originally only the MCD had metrics support, and then for MCO-74 we added it to the MCC, but we more or less did it by duplicating the method used in the MCD rather than revisiting all of those decisions.  There is no cohesion on how metrics should look or what they even mean in the MCO. We need to decide how we want to implement this properly. Making a pkg/metrics (or something like this) should enable users and future team members to understand our observability story. Creation and Teardown of metrics should all go through the same place.

      Acceptance Criteria

      • All component of the MCO should call a general metric handler for metric registration and listener start.
      • This handler should eventually be moved to the health controller for the purpose of unifying pool health monitoring and reporting tools.
      • Bring a new health controller pod in so that it :
        • will take care of all pool health reporting telemetries
        • will register and listen to metrics from various part of the MCO, including the operator, the controller and the daemons, all in one place
        • will report all updates and metric changes to Prometheus 

              rh-ee-iqian Ines Qian (Inactive)
              jkyros@redhat.com John Kyros
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: