Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-74979

should degrade monitoring if the alertmanagerconfig with missing secret

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.22.0
    • Monitoring
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Low
    • Yes
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      enable UWM

      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: cluster-monitoring-config
        namespace: openshift-monitoring
      data:
        config.yaml: |
          enableUserWorkload: true

      enable UWM alertmanager and enableAlertmanagerConfig

      apiVersion: v1
      kind: ConfigMap
      data:
        config.yaml: |
          alertmanager:
            enabled: true
            enableAlertmanagerConfig: true
      metadata:
        name: user-workload-monitoring-config
        namespace: openshift-user-workload-monitoring

      create custom AlertmanagerConfig which the secret my-workflow-webhook is missing

      $ oc new-project noodles;
      $ oc create -f - << eof
      apiVersion: monitoring.coreos.com/v1beta1
      kind: AlertmanagerConfig
      metadata:
        name: example
        namespace: noodles
      spec:
        route:
          groupBy:
          - namespace
          receiver: msteams
        receivers:
        - name: msteams
          msteamsConfigs:
          - webhookUrl: 
              key: url # 
              name: my-workflow-webhook # k8s secret name in same namespace as AlertManagerConfig
            sendResolved: true
            title: "mytitle"
            text: "mytext"
      eof 

      checked with 4.21.0-0.nightly-2026-02-02-085603, 4.22.0-0.nightly-2026-01-26-181726 which they are not with fix for https://issues.redhat.com/browse/OCPBUGS-67303, PR: https://github.com/openshift/prometheus-operator/pull/358 and compared with 4.22.0-0.nightly-2026-02-02-081748 which the fix https://github.com/openshift/prometheus-operator/pull/358 is in

      4.21.0-0.nightly-2026-02-02-085603/4.22.0-0.nightly-2026-01-26-181726, monitoring is degreaded for "unable to get secret \"my-workflow-webhook\": secrets \"my-workflow-webhook\" not found",  but for 4.22.0-0.nightly-2026-02-02-081748, monitoring is not degreaded, as tested in https://issues.redhat.com/browse/OCPBUGS-67303?focusedId=28903702&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-28903702, upgrade to another version, would see the upgrade is blocked by the missing secret, maybe it's late to notice customer, for the CI upgrade jobs, it will mark the job as failed and need owner to analyze

      4.21.0-0.nightly-2026-02-02-085603

      $ oc -n openshift-user-workload-monitoring get cm user-workload-monitoring-config -oyaml
      apiVersion: v1
      data:
        config.yaml: |
          alertmanager:
            enabled: true
            enableAlertmanagerConfig: true
      kind: ConfigMap
      metadata:
        annotations:
          kubectl.kubernetes.io/last-applied-configuration: |
            {"apiVersion":"v1","data":{"config.yaml":"alertmanager:\n  enabled: true\n  enableAlertmanagerConfig: true\n"},"kind":"ConfigMap","metadata":{"annotations":{},"name":"user-workload-monitoring-config","namespace":"openshift-user-workload-monitoring"}}
        creationTimestamp: "2026-02-03T07:58:22Z"
        labels:
          app.kubernetes.io/managed-by: cluster-monitoring-operator
          app.kubernetes.io/part-of: openshift-monitoring
        name: user-workload-monitoring-config
        namespace: openshift-user-workload-monitoring
        resourceVersion: "40811"
        uid: 2c83db6b-79d6-4572-9d05-2a61322c1668
      
      $ date -u;oc -n openshift-user-workload-monitoring logs deploy/prometheus-operator | grep my-workflow-webhook | tail -n1
      Tue Feb  3 08:36:55 UTC 2026
      ts=2026-02-03T08:36:35.049504365Z level=error caller=/go/src/github.com/coreos/prometheus-operator/pkg/operator/resource_reconciler.go:678 msg="Unhandled Error" logger=UnhandledError err="sync \"openshift-user-workload-monitoring/user-workload\" failed: provision alertmanager configuration: failed to generate Alertmanager configuration: AlertmanagerConfig noodles/example: MSTeamsConfig[0]: unable to get secret \"my-workflow-webhook\": secrets \"my-workflow-webhook\" not found"
      
      $ oc get co monitoring
      NAME         VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      monitoring   4.21.0-0.nightly-2026-02-02-085603   False       True          True       10m     UpdatingUserWorkloadAlertmanager: waiting for Alertmanager User Workload object changes failed: waiting for Alertmanager openshift-user-workload-monitoring/user-workload: context deadline exceeded: condition Reconciled: status False: reason ReconciliationFailed: provision alertmanager configuration: failed to generate Alertmanager configuration: AlertmanagerConfig noodles/example: MSTeamsConfig[0]: unable to get secret "my-workflow-webhook": secrets "my-workflow-webhook" not found
      

      4.22.0-0.nightly-2026-01-26-181726

      $ oc -n openshift-user-workload-monitoring get cm user-workload-monitoring-config -oyaml
      apiVersion: v1
      data:
        config.yaml: |
          alertmanager:
            enabled: true
            enableAlertmanagerConfig: true
      kind: ConfigMap
      metadata:
        annotations:
          kubectl.kubernetes.io/last-applied-configuration: |
            {"apiVersion":"v1","data":{"config.yaml":"alertmanager:\n  enabled: true\n  enableAlertmanagerConfig: true\n"},"kind":"ConfigMap","metadata":{"annotations":{},"name":"user-workload-monitoring-config","namespace":"openshift-user-workload-monitoring"}}
        creationTimestamp: "2026-02-03T07:59:11Z"
        labels:
          app.kubernetes.io/managed-by: cluster-monitoring-operator
          app.kubernetes.io/part-of: openshift-monitoring
        name: user-workload-monitoring-config
        namespace: openshift-user-workload-monitoring
        resourceVersion: "94689"
        uid: bd3e17c6-2e8e-448a-b8e8-b170b8837a20$
      
      $ date -u;oc -n openshift-user-workload-monitoring logs deploy/prometheus-operator | grep my-workflow-webhook | tail -n1
      Tue Feb  3 08:37:01 AM UTC 2026
      ts=2026-02-03T08:36:45.021356744Z level=error caller=/go/src/github.com/coreos/prometheus-operator/pkg/operator/resource_reconciler.go:678 msg="Unhandled Error" logger=UnhandledError err="sync \"openshift-user-workload-monitoring/user-workload\" failed: provision alertmanager configuration: failed to generate Alertmanager configuration: AlertmanagerConfig noodles/example: MSTeamsConfig[0]: unable to get secret \"my-workflow-webhook\": secrets \"my-workflow-webhook\" not found"
      
      $ oc get co monitoring
      NAME         VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      monitoring   4.22.0-0.nightly-2026-01-26-181726   False       True          True       11m     UpdatingUserWorkloadAlertmanager: waiting for Alertmanager User Workload object changes failed: waiting for Alertmanager openshift-user-workload-monitoring/user-workload: context deadline exceeded: condition Reconciled: status False: reason ReconciliationFailed: provision alertmanager configuration: failed to generate Alertmanager configuration: AlertmanagerConfig noodles/example: MSTeamsConfig[0]: unable to get secret "my-workflow-webhook": secrets "my-workflow-webhook" not found
      

      4.22.0-0.nightly-2026-02-02-081748

      $ oc -n openshift-user-workload-monitoring get cm user-workload-monitoring-config -oyaml
      apiVersion: v1
      data:
        config.yaml: |
          alertmanager:
            enabled: true
            enableAlertmanagerConfig: true
      kind: ConfigMap
      metadata:
        creationTimestamp: "2026-02-03T07:36:19Z"
        labels:
          app.kubernetes.io/managed-by: cluster-monitoring-operator
          app.kubernetes.io/part-of: openshift-monitoring
        name: user-workload-monitoring-config
        namespace: openshift-user-workload-monitoring
        resourceVersion: "36710"
        uid: c65ab372-e759-46fd-84e9-f297de979ab0
      
      $ date -u;oc -n openshift-user-workload-monitoring logs deploy/prometheus-operator | grep my-workflow-webhook | tail -n1
      Tue Feb  3 08:37:04 AM UTC 2026
      ts=2026-02-03T08:23:08.020926765Z level=info caller=/go/src/github.com/coreos/prometheus-operator/vendor/k8s.io/client-go/tools/events/event_broadcaster.go:338 msg="Event occurred" object.name=example object.namespace=noodles kind=AlertmanagerConfig apiVersion=monitoring.coreos.com/v1alpha1 type=Warning reason=InvalidConfiguration action=SelectingAlertmanagerConfigResources note="AlertmanagerConfig example was rejected due to invalid configuration: unable to get secret \"my-workflow-webhook\": secrets \"my-workflow-webhook\" not found"
      
      $ oc get co monitoring
      NAME         VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      monitoring   4.22.0-0.nightly-2026-02-02-081748   True        False         False      149m

      Version-Release number of selected component (if applicable):

      4.22 payload with fix https://github.com/openshift/prometheus-operator/pull/358

      How reproducible:

      always

      Steps to Reproduce:

      1. see the descriptions

      Actual results:

      4.22 payload with fix https://github.com/openshift/prometheus-operator/pull/358, monitoring is not degraded

      Expected results:

      4.22 payload with fix https://github.com/openshift/prometheus-operator/pull/358, monitoring is degraded

      Additional info:

          

              janantha@redhat.com Jayapriya Pai
              juzhao@redhat.com Junqi Zhao
              None
              None
              Junqi Zhao Junqi Zhao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: