Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-18440

firing alert is not shown under dev console Observe -> Alerts tab

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • 4.13.z, 4.14.0
    • Observability UI
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • No
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      InsightsDisabled/SimpleContentAccessNotAvailable/InsightsRecommendationActive alerts are defined under openshift-insights namespace prometheusrule.

      NOTE: InsightsDisabled/SimpleContentAccessNotAvailable alerts have "namespace: openshift-insights" label in prometheusrules file, InsightsRecommendationActive alert does not have

      $ oc -n openshift-insights get prometheusrules insights-prometheus-rules -oyaml
      ...
      spec:
        groups:
        - name: insights
          rules:
          - alert: InsightsDisabled
            annotations:
              description: 'Insights operator is disabled. In order to enable Insights and
                benefit from recommendations specific to your cluster, please follow steps
                listed in the documentation: https://docs.openshift.com/container-platform/latest/support/remote_health_monitoring/enabling-remote-health-reporting.html'
              summary: Insights operator is disabled.
            expr: max without (job, pod, service, instance) (cluster_operator_conditions{name="insights",
              condition="Disabled"} == 1)
            for: 5m
            labels:
              namespace: openshift-insights
              severity: info
          - alert: SimpleContentAccessNotAvailable
            annotations:
              description: Simple content access (SCA) is not enabled. Once enabled, Insights
                Operator can automatically import the SCA certificates from Red Hat OpenShift
                Cluster Manager making it easier to use the content provided by your Red
                Hat subscriptions when creating container images. See https://docs.openshift.com/container-platform/latest/cicd/builds/running-entitled-builds.html
                for more information.
              summary: Simple content access certificates are not available.
            expr: ' max without (job, pod, service, instance) (max_over_time(cluster_operator_conditions{name="insights",
              condition="SCAAvailable", reason="NotFound"}[5m]) == 0)'
            for: 5m
            labels:
              namespace: openshift-insights
              severity: info
          - alert: InsightsRecommendationActive
            annotations:
              description: Insights recommendation "{{ $labels.description }}" with total
                risk "{{ $labels.total_risk }}" was detected on the cluster. More information
                is available at {{ $labels.info_link }}.
              summary: An Insights recommendation is active for this cluster.
            expr: insights_recommendation_active == 1
            for: 5m
            labels:
              severity: info 

      default cluster, no attached PVs for prometheus, this would trigger InsightsRecommendationActive alert, but checked on developer console,  Observe -> Alerts tab, InsightsRecommendationActive alert is not there, only found  InsightsDisabled/SimpleContentAccessNotAvailable alert.

      $ token=`oc create token prometheus-k8s -n openshift-monitoring`
      $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=ALERTS{alertname="InsightsRecommendationActive"}' | jq
      {
        "status": "success",
        "data": {
          "resultType": "vector",
          "result": [
            {
              "metric": {
                "__name__": "ALERTS",
                "alertname": "InsightsRecommendationActive",
                "alertstate": "firing",
                "container": "insights-operator",
                "description": "Prometheus metrics data will be lost when the Prometheus pod is restarted or recreated",
                "endpoint": "https",
                "info_link": "https://console.redhat.com/openshift/insights/advisor/clusters/fe975ccd-5e51-429b-9d65-718a09495195?first=ccx_rules_ocp.external.rules.empty_prometheus_db_volume|PROMETHEUS_DB_VOLUME_IS_EMPTY",
                "instance": "10.129.0.16:8443",
                "job": "metrics",
                "namespace": "openshift-insights",
                "pod": "insights-operator-766bfb4974-fllpl",
                "prometheus": "openshift-monitoring/k8s",
                "service": "metrics",
                "severity": "info",
                "total_risk": "Low"
              },
              "value": [
                1693554627.507,
                "1"
              ]
            }
          ]
        }
      }

      thanos-querier rules API also shows InsightsRecommendationActive alert is firing, please noted that for InsightsRecommendationActive alert, "namespace": "openshift-insights" label is in rules.alerts.labels section, not under rules.labels section

      $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules'  | jq
      ...
            {
              "name": "insights",
              "file": "/etc/prometheus/rules/prometheus-k8s-rulefiles-0/openshift-insights-insights-prometheus-rules-64f80bc9-c9e8-49c5-ade2-cb63133d1555.yaml",
              "rules": [
                {
                  "state": "inactive",
                  "name": "InsightsDisabled",
                  "query": "max without (job, pod, service, instance) (cluster_operator_conditions{condition=\"Disabled\",name=\"insights\"} == 1)",
                  "duration": 300,
                  "labels": {
                    "namespace": "openshift-insights",
                    "prometheus": "openshift-monitoring/k8s",
                    "severity": "info"
                  },
                  "annotations": {
                    "description": "Insights operator is disabled. In order to enable Insights and benefit from recommendations specific to your cluster, please follow steps listed in the documentation: https://docs.openshift.com/container-platform/latest/support/remote_health_monitoring/enabling-remote-health-reporting.html",
                    "summary": "Insights operator is disabled."
                  },
                  "alerts": [],
                  "health": "ok",
                  "evaluationTime": 0.000247544,
                  "lastEvaluation": "2023-09-01T07:56:56.694516549Z",
                  "type": "alerting"
                },
                {
                  "state": "firing",
                  "name": "InsightsRecommendationActive",
                  "query": "insights_recommendation_active == 1",
                  "duration": 300,
                  "labels": {
                    "prometheus": "openshift-monitoring/k8s",
                    "severity": "info"
                  },
                  "annotations": {
                    "description": "Insights recommendation \"{{ $labels.description }}\" with total risk \"{{ $labels.total_risk }}\" was detected on the cluster. More information is available at {{ $labels.info_link }}.",
                    "summary": "An Insights recommendation is active for this cluster."
                  },
                  "alerts": [
                    {
                      "labels": {
                        "alertname": "InsightsRecommendationActive",
                        "container": "insights-operator",
                        "description": "Prometheus metrics data will be lost when the Prometheus pod is restarted or recreated",
                        "endpoint": "https",
                        "info_link": "https://console.redhat.com/openshift/insights/advisor/clusters/fe975ccd-5e51-429b-9d65-718a09495195?first=ccx_rules_ocp.external.rules.empty_prometheus_db_volume|PROMETHEUS_DB_VOLUME_IS_EMPTY",
                        "instance": "10.129.0.16:8443",
                        "job": "metrics",
                        "namespace": "openshift-insights",
                        "pod": "insights-operator-766bfb4974-fllpl",
                        "service": "metrics",
                        "severity": "info",
                        "total_risk": "Low"
                      },
                      "annotations": {
                        "description": "Insights recommendation \"Prometheus metrics data will be lost when the Prometheus pod is restarted or recreated\" with total risk \"Low\" was detected on the cluster. More information is available at https://console.redhat.com/openshift/insights/advisor/clusters/fe975ccd-5e51-429b-9d65-718a09495195?first=ccx_rules_ocp.external.rules.empty_prometheus_db_volume|PROMETHEUS_DB_VOLUME_IS_EMPTY.",
                        "summary": "An Insights recommendation is active for this cluster."
                      },
                      "state": "firing",
                      "activeAt": "2023-09-01T07:12:56.693852957Z",
                      "value": "1e+00",
                      "partialResponseStrategy": "WARN"
                    }
                  ],
                  "health": "ok",
                  "evaluationTime": 0.000428525,
                  "lastEvaluation": "2023-09-01T07:56:56.694899015Z",
                  "type": "alerting"
                },
                {
                  "state": "inactive",
                  "name": "SimpleContentAccessNotAvailable",
                  "query": "max without (job, pod, service, instance) (max_over_time(cluster_operator_conditions{condition=\"SCAAvailable\",name=\"insights\",reason=\"NotFound\"}[5m]) == 0)",
                  "duration": 300,
                  "labels": {
                    "namespace": "openshift-insights",
                    "prometheus": "openshift-monitoring/k8s",
                    "severity": "info"
                  },
                  "annotations": {
                    "description": "Simple content access (SCA) is not enabled. Once enabled, Insights Operator can automatically import the SCA certificates from Red Hat OpenShift Cluster Manager making it easier to use the content provided by your Red Hat subscriptions when creating container images. See https://docs.openshift.com/container-platform/latest/cicd/builds/running-entitled-builds.html for more information.",
                    "summary": "Simple content access certificates are not available."
                  },
                  "alerts": [],
                  "health": "ok",
                  "evaluationTime": 0.00012941,
                  "lastEvaluation": "2023-09-01T07:56:56.694768041Z",
                  "type": "alerting"
                }
              ],
              "interval": 30,
              "evaluationTime": 0.000823825,
              "lastEvaluation": "2023-09-01T07:56:56.694507342Z",
              "limit": 0,
              "partialResponseStrategy": "ABORT"
            }, 

      see dev console picture: https://drive.google.com/file/d/1CH8ihLYfJjxU3jbBsbZQWSGjRF5Wz48i/view?usp=sharing

      checked, this issue is also exist in 4.13, example: 4.13.0-0.nightly-2023-08-31-163330

      Version-Release number of selected component (if applicable):

      4.14.0-0.nightly-2023-08-28-154013

      How reproducible:

      always

      Steps to Reproduce:

      1. check openshift-insights alerts ondev console Observe -> Alerts tab
      2.
      3.
      

      Actual results:

      firing alert is not shown under dev console Observe -> Alerts tab, just list 2 not firing alerts

      Expected results:

      firing alert is shown under dev console Observe -> Alerts tab

      Additional info:

       

              gbernal@redhat.com Gabriel Bernal
              juzhao@redhat.com Junqi Zhao
              None
              None
              Junqi Zhao Junqi Zhao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: