Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-12232

BZ#2311158 Prometheus alert flipping from firing to pending and back while metric is steady

XMLWordPrintable

    • 2
    • False
    • False
    • No Docs Impact
    • openstack-tripleo-heat-templates-11.6.1-2.20241220151710.9adcac6.el8ost
    • None
    • Approved
    • CloudOps 2024 Sprint 22, CloudOps 2024 Sprint 23, CloudOps 2024 Sprint 24, CloudOps 2024 Sprint 25, CloudOps 2025 Sprint 6
    • 5
    • Moderate

      Description of problem:
      Customer created a custom alert like this:

      ~~~

      • alert: OpenStack Service is Down
        annotations:
        summary: ' {{ $labels.process }} down on {{ $labels.host }} '
        expr: 'sensubility_container_health_status {process!="metrics_qdr"}

        == 0'
        for: 2m
        labels:
        severity: critical
        ~~~

      If we base ourselves on the metric from iscsid service from controller-0, even that the metric is steady, the alert flips continuously between firing and pending.

      Metric can be retrieved with:

      sensubility_container_health_status

      {container="sg-core", endpoint="prom-https", host="controller-0.localdomain", process="iscsid", service="stf1-xxxxxx-sens-meter"}

      And alert status flipping with:

      ALERTS

      {alertname="OpenStack Service is Down", container="sg-core", endpoint="prom-https", host="controller-0.localdomain", process="iscsid", service="stf1-xxxxxx-sens-meter", severity="critical"}

      Version-Release number of selected component (if applicable):
      RHOSP 16.2
      STF 1.5

      How reproducible:
      The alert works as expected in one of our labs.

      Additional info:
      More info in private comments.

          There are no Sub-Tasks for this issue.

              lnatapov@redhat.com Leonid Natapov
              jira-bugzilla-migration RH Bugzilla Integration
              Leonid Natapov Leonid Natapov
              rhos-conplat-observability
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: