Uploaded image for project: 'OpenShift API Server'
  1. OpenShift API Server
  2. API-1373

Expose client-go metrics in Telemetry

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Obsolete
    • Icon: Undefined Undefined
    • openshift-4.11
    • None
    • None
    • None

      To be able to alert based on the availability of the various proxy in front of the apiserver to detect network issues in between clients and the apiserver, we need to gather more insights about the current latencies and error rates of these proxies.

      This use case is a good fit for SLO-based alerting (https://sre.google/workbook/alerting-on-slos/) since we want to know the current availability of the proxies and detect unavailabilities that could be caused by network issues.

      To create SLO-based alerts we first need to determine each proxy SLO, so we would need to gather insight on their latency and error rate. Ideally, for the latency, we would want the 99th percentile since it allows detecting failures better.

      The different types of connections to the apiserver that we want to get insights on are:

      • Direct connection
      • APIServer Service
      • Internal load balancer
      • Everything else

      Since we have 4 different scenarios and 2 types of information that we want to gather, we should theoretically only send 8 series to Telemetry.

      A WIP document for the telemetry request is available here: https://docs.google.com/document/d/1AJ8H2K4h3FVPfPLVzivIFYQxmJLvl3vZjXn1_9jkjqk/edit?usp=sharing

      This PR creates recording rules that will then be used to send data via Telemetry: https://github.com/openshift/cluster-kube-apiserver-operator/pull/1272

            aojeagar Antonio Ojea (Inactive)
            dgrisonn@redhat.com Damien Grisonnet
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: