-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.13.0, 4.12.0
-
None
Description of problem:
AlertmanagerConfig with missing options causes Alertmanager to crash
Version-Release number of selected component (if applicable):
How reproducible:
Always
Steps to Reproduce:
A cluster administrator has enabled monitoring for user-defined projects.
CMO
~~~
config.yaml: |
enableUserWorkload: true
prometheusK8s:
retention: 7d
~~~
A cluster administrator has enabled alert routing for user-defined projects.
UWM cm / CMO cm
~~~
apiVersion: v1
kind: ConfigMap
metadata:
name: user-workload-monitoring-config
namespace: openshift-user-workload-monitoring
data:
config.yaml: |
alertmanager:
enabled: true
enableAlertmanagerConfig: true
~~~
verify existing config:
~~~
$ oc exec -n openshift-user-workload-monitoring alertmanager-user-workload-0 -- amtool config show --alertmanager.url http://localhost:9093
global:
resolve_timeout: 5m
http_config:
follow_redirects: true
smtp_hello: localhost
smtp_require_tls: true
pagerduty_url: https://events.pagerduty.com/v2/enqueue
opsgenie_api_url: https://api.opsgenie.com/
wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/
victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/
telegram_api_url: https://api.telegram.org
route:
receiver: Default
group_by:
- namespace
continue: false
receivers:
- name: Default
templates: []
~~~
create alertmanager config without options "smtp_from:" and "smtp_smarthost"
~~~
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: example
namespace: example-namespace
spec:
receivers:
- emailConfigs:
- to: some.username@example.com
name: custom-rules1
route:
matchers:
- name: alertname
receiver: custom-rules1
repeatInterval: 1m
~~~
check logs for alertmanager: the following error is seen.
~~~
ts=2023-09-05T12:07:33.449Z caller=coordinator.go:118 level=error component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config_out/alertmanager.env.yaml err="no global SMTP smarthost set"
~~~
Actual results:
Alertmamnager fails to restart.
Expected results:
CRD should be pre validated.
Additional info:
Reproducible with and without user workload Alertmanager.
- blocks
-
OCPBUGS-48050 [release-4.18] AlertmanagerConfig with missing options causes Alertmanager to crash
-
- Closed
-
- is cloned by
-
OCPBUGS-48050 [release-4.18] AlertmanagerConfig with missing options causes Alertmanager to crash
-
- Closed
-
- links to
-
RHEA-2024:11038
OpenShift Container Platform 4.19.z bug fix update