Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-58194

Backport of Prometheus Operator Bug Fix: "One Alertmanager Config failing blocks all others" to OCP 4.18

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • 4.18.0
    • 4.16.z
    • Monitoring
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 1
    • None
    • None
    • None
    • Mon Sprint 273, MON Sprint 274, MON Sprint 275, MON Sprint 276
    • 4
    • In Progress
    • Bug Fix
    • Before this fix, url passed in AlertmanagerConfig receiver in particular slack and discord config was not validated. With this fix, urls passed are validated and rejected if invalid url is passed
    • None
    • None
    • None
    • None

      Description of problem:

      When an invalid URL in an Alertmanager Config: the following logs are present in the prometheus-user-workload pods

      26 jun 2025, 12:13:57.796 |   level=error ts=2025-06-26T10:13:57.785124884Z caller=klog.go:126 component=k8s\_client\_runtime func=ErrorDepth msg="sync \\"openshift-user-workload-monitoring/user-workload\\" failed: provision alertmanager configuration: failed to generate Alertmanager configuration: AlertmanagerConfig XXXXX/XXXXX: SlackConfig\[0\]: invalid URL \\"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\\" in key \\"api\_url\\" from secret \\"XXXXXXXXXXXX\\": validate url from string failed for xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: unsupported scheme \\"\\" for URL"
          

      With alertmanager enabled for user workload monitoring. This was reported using Slack receiver in alertmanager config

      Environment:

      User-workload-monitoring, Alertmanager configuration.

      Important Notes:
      This is an issue with present in Prometheus operator prior to - 0.80.0, for which a fix was implemented: , which is now present in OCP 4.19.

      Impact

      This issue is impacting customers for example using OCP clusters pre-4.19, especially those with Extended Update support, they remain impacted by this issue for the duration of their support without changing or upgrading to a 4.19 cluster.

      Version-Release number of selected component (if applicable):

          4.16.z

      How reproducible:

          Easily reproducible on a 4.16 cluster, 

      This is a request for this fix from  prometheus operator 0.80.0 to be backported to earlier OCP versions

              janantha@redhat.com Jayapriya Pai
              rhn-support-ccostell Cormac Costello
              None
              None
              Junqi Zhao Junqi Zhao
              None
              Votes:
              4 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: