Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-4363

TargetDown alert expression is NOT correctly joining kube-state-metrics metric

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Normal
    • None
    • 4.11.z, 4.10.z, 4.8.z
    • Monitoring
    • Moderate
    • MON Sprint 228, MON Sprint 229, MON Sprint 231
    • Hide

      None

      Show
      None

    Description

      This is a clone of issue OCPBUGS-1453. The following is the description of the original issue:

      Description of problem:

      TargetDown alert fired while it shouldn't.
      Prometheus endpoints are not always properly unregistered and the alert will therefore think that some Kube service endpoints are down

      Version-Release number of selected component (if applicable):

      The problem as always been there.

      How reproducible:

      Not reproducible.
      Most of the time Prometheus endpoints are properly unregistered.
      Aim here is to get the TargetDown Prometheus expression be more resilient; this can be tested on past metrics data in which the unregistration issue was encountered.

      Steps to Reproduce:

      N/A
      

      Actual results:

      TargetDown alert triggered while Kube service endpoints are all up & running.

      Expected results:

      TargetDown alert should not have been trigerred.

      Attachments

        Issue Links

          Activity

            Public project attachment banner

              context keys: [headless, issue, helper, isAsynchronousRequest, project, action, user]
              current Project key: OCPBUGS

              People

                spasquie@redhat.com Simon Pasquier
                openshift-crt-jira-prow OpenShift Prow Bot
                Hongyan Li Hongyan Li
                Nicolas Grauss, Tomas Havlicek
                Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved: