Uploaded image for project: 'OpenShift GitOps'
  1. OpenShift GitOps
  2. GITOPS-2456

Expose instance level metrics in GitOps operators

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Major Major
    • 1.8.0
    • None
    • Operator
    • None
    • GITOPS Sprint 229

      Story (Required)

      As a cluster admin who has GitOps operator installed on their cluster, i would like the GitOps operator to expose instance level health related metrics so that I have the option to consume them should I want to 

      Background (Required)

      In order to set up alerts to be notified when some instance workloads are down, we must first have a way to encode this information into Prometheus. This requires us to define new metrics within the operator, that would work on the instance level, and expose them at a new endpoint so that they can be consumed by prometheus.

      Out of scope

      - metrics other than the status of a workload
      - setting up resources to actually send alerts out 

      Approach (Required)

      As a part of this story we must :

      •  Define and create new metrics within the operator to describe each workload's current status
      • Start a new metrics server go routine within the operator, at a new port
      • Write operator logic in various places to track and update metric values locally whenever a workload's status changes
      • Use Prometheus go-lang client library
      • Use Argo CD Image Updater code for reference  -> https://github.com/argoproj-labs/argocd-image-updater/blob/master/pkg/metrics/metrics.go

       

      {}The metrics we need to focus on at this stage are :

      • applicationControllerStatus
      • applicationSetControllerStatus
      • repoServerStatus
      • serverStatus
      • notificationControllerStatus
      • dexStatus
      • redisStatus
      • argoCDPhaseStatus

      Each metric can take on a value between 0-3 that maps as follows:

      “Unknown” = 0

      “Pending” = 1

      “Running” = 2

      “Available” = 3

       

      The value of each metric must be updated and tracked to reflect the status of the actual workload and published to the /metrics end point immediately

      Dependencies

      Depends on https://issues.redhat.com/browse/GITOPS-1989

      Acceptance Criteria (Mandatory)

      Operator spins up new metrics server internally and exposes new metrics at /metrics end point, which can be used for consumption

      INVEST Checklist

      Dependencies identified

      Blockers noted and expected delivery timelines set

      Design is implementable

      Acceptance criteria agreed upon

      Story estimated

      Legend

      Unknown

      Verified

      Unsatisfied

      Done Checklist

      • Code is completed, reviewed, documented and checked in
      • Unit and integration test automation have been delivered and running cleanly in continuous integration/staging/canary environment
      • Continuous Delivery pipeline(s) is able to proceed with new code included
      • Customer facing documentation, API docs etc. are produced/updated, reviewed and published
      • Acceptance criteria are met

              jrao@redhat.com Jaideep Rao
              jrao@redhat.com Jaideep Rao
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: