Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-48050

[release-4.18] AlertmanagerConfig with missing options causes Alertmanager to crash

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • None
    • 4.13.0
    • Monitoring
    • Moderate
    • No
    • MON Sprint 264
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, if the SMTP `smarthost` or `from` fields under the `emailConfigs` object were not specified at the global or receiver level in the `AlertmanagerConfig` custom resource (CR), Alertmanager would crash because these fields are required. With this release, the Prometheus Operator fails reconciliation if these fields are not specified. Therefore, the Prometheus Operator no longer pushes invalid configurations to Alertmanager, preventing it from crashing. (link:https://issues.redhat.com/browse/OCPBUGS-48050[*OCPBUGS-48050*])
      Show
      * Previously, if the SMTP `smarthost` or `from` fields under the `emailConfigs` object were not specified at the global or receiver level in the `AlertmanagerConfig` custom resource (CR), Alertmanager would crash because these fields are required. With this release, the Prometheus Operator fails reconciliation if these fields are not specified. Therefore, the Prometheus Operator no longer pushes invalid configurations to Alertmanager, preventing it from crashing. (link: https://issues.redhat.com/browse/OCPBUGS-48050 [* OCPBUGS-48050 *])
    • Bug Fix
    • Done

      Description of problem:

      AlertmanagerConfig with missing options causes Alertmanager to crash

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      Always

      Steps to Reproduce:

      A cluster administrator has enabled monitoring for user-defined projects.
      CMO 
      
      ~~~
       config.yaml: |
          enableUserWorkload: true
          prometheusK8s:
            retention: 7d
      ~~~
      
      A cluster administrator has enabled alert routing for user-defined projects. 
      
      UWM cm / CMO cm 
      
      ~~~
      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: user-workload-monitoring-config
        namespace: openshift-user-workload-monitoring
      data:
        config.yaml: |
          alertmanager:
            enabled: true 
            enableAlertmanagerConfig: true
      ~~~
      
      verify existing config: 
      
      ~~~
      $ oc exec -n openshift-user-workload-monitoring alertmanager-user-workload-0 -- amtool config show --alertmanager.url http://localhost:9093  
      global:
        resolve_timeout: 5m
        http_config:
          follow_redirects: true
        smtp_hello: localhost
        smtp_require_tls: true
        pagerduty_url: https://events.pagerduty.com/v2/enqueue
        opsgenie_api_url: https://api.opsgenie.com/
        wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/
        victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/
        telegram_api_url: https://api.telegram.org
      route:
        receiver: Default
        group_by:
        - namespace
        continue: false
      receivers:
      - name: Default
      templates: []
      ~~~
      
      create alertmanager config without options "smtp_from:" and "smtp_smarthost"
      
      ~~~
      apiVersion: monitoring.coreos.com/v1alpha1
      kind: AlertmanagerConfig
      metadata:
        name: example
        namespace: example-namespace
      spec:
        receivers:
          - emailConfigs:
              - to: some.username@example.com
            name: custom-rules1
        route:
          matchers:
            - name: alertname
          receiver: custom-rules1
          repeatInterval: 1m
      ~~~
      
      check logs for alertmanager: the following error is seen. 
      
      ~~~
      ts=2023-09-05T12:07:33.449Z caller=coordinator.go:118 level=error component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config_out/alertmanager.env.yaml err="no global SMTP smarthost set"
      ~~~ 

      Actual results:

      Alertmamnager fails to restart.

      Expected results:

      CRD should be pre validated.

      Additional info:

      Reproducible with and without user workload Alertmanager.

              janantha@redhat.com Jayapriya Pai
              rhn-support-krg Kruthika G
              Junqi Zhao Junqi Zhao
              Eliska Romanova Eliska Romanova
              Simon Pasquier
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: