Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-35483

`PrometheusRemoteStorageFailures` alert failed to trigger

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • None
    • 4.16
    • Monitoring
    • None
    • Moderate
    • No
    • MON Sprint 256, MON Sprint 257
    • 2
    • False
    • Hide

      None

      Show
      None
    • Hide
      Before the fix, the "PrometheusRemoteWriteBehind" alert was not triggered for a remote-write endpoint that was never reached by Prometheus. After the fix, the alert will activate to warn about these unreachability issues, which may occur due to connectivity problems or mistakes in the remote-write endpoint configuration.
      Show
      Before the fix, the "PrometheusRemoteWriteBehind" alert was not triggered for a remote-write endpoint that was never reached by Prometheus. After the fix, the alert will activate to warn about these unreachability issues, which may occur due to connectivity problems or mistakes in the remote-write endpoint configuration.
    • Release Note Not Required
    • In Progress

      Description of problem:

       

      Version-Release number of selected component (if applicable):

      4.16.0-0.nightly-2024-06-13-084629

      How reproducible:

      100%

      Steps to Reproduce:

      1.apply configmap
      *****
      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: cluster-monitoring-config
        namespace: openshift-monitoring
      data:
        config.yaml: |
          prometheusK8s:
            remoteWrite:
              - url: "http://invalid-remote-storage.example.com:9090/api/v1/write"
                queue_config:
                  max_retries: 1
      *****
      
      2. check logs
      % oc logs -c prometheus prometheus-k8s-0 -n openshift-monitoring
      ...
      ts=2024-06-14T01:28:01.804Z caller=dedupe.go:112 component=remote level=warn remote_name=5ca657 url=http://invalid-remote-storage.example.com:9090/api/v1/write msg="Failed to send batch, retrying" err="Post \"http://invalid-remote-storage.example.com:9090/api/v1/write\": dial tcp: lookup invalid-remote-storage.example.com on 172.30.0.10:53: no such host"
      
      3.query after 15mins
      % oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=ALERTS{alertname="PrometheusRemoteStorageFailures"}' | jq
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
      100   145  100    78  100    67    928    797 --:--:-- --:--:-- --:--:--  1726
      {
        "status": "success",
        "data": {
          "resultType": "vector",
          "result": [],
          "analysis": {}
        }
      }
      
      % oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=prometheus_remote_storage_failures_total' | jq
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
      100   124  100    78  100    46   1040    613 --:--:-- --:--:-- --:--:--  1653
      {
        "status": "success",
        "data": {
          "resultType": "vector",
          "result": [],
          "analysis": {}
        }
      }
      

      Actual results:

      alert did not triggeted

      Expected results:

      alert triggered, able to see the alert and metrics

      Additional info:

      below metrics show as `No datapoints found.`
      prometheus_remote_storage_failures_total
      prometheus_remote_storage_samples_dropped_total
      prometheus_remote_storage_retries_total
      `prometheus_remote_storage_samples_failed_total` value is 0

              rh-ee-amrini Ayoub Mrini
              tagao@redhat.com Tai Gao
              Tai Gao Tai Gao
              Eliska Romanova Eliska Romanova
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: