Details
-
Bug
-
Resolution: Won't Do
-
Normal
-
None
-
4.12
-
None
-
Moderate
-
False
-
Description
Description of problem:
Better message in the CMO degraded/unavailable conditions when alertmanager pods can't be scheduled
Version-Release number of selected component (if applicable):
4.12.0-0.nightly-2022-08-24-053339
How reproducible:
always
Steps to Reproduce:
1. Configure alertmanager with an invalid volume claim template (unknown storage class for instance): apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | alertmanagerMain: volumeClaimTemplate: metadata: name: monitorpvc spec: storageClassName: foo volumeMode: Filesystem resources: requests: storage: 1Gi 2. Wait for CMO to go degraded
Actual results:
% oc -n openshift-monitoring describe pod alertmanager-main-0 |tail -n 10 QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 13m default-scheduler 0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling. Warning FailedScheduling 13m default-scheduler 0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling. % oc get co monitoring -o jsonpath='{.status.conditions}' | jq 'map(select(.type=="Degraded" or .type=="Available"))' [ { "lastTransitionTime": "2022-08-25T08:51:55Z", "reason": "AsExpected", "status": "True", "type": "Available" }, { "lastTransitionTime": "2022-08-25T08:51:55Z", "message": "waiting for Alertmanager object changes failed: waiting for Alertmanager openshift-monitoring/main: expected 2 replicas, got 0 updated replicas", "reason": "UpdatingAlertmanagerFailed", "status": "True", "type": "Degraded" } ]
Expected results:
CMO should surface a better explanation as to why the pods aren't in the desired state. The available status should be false
Additional info:
The fix of the following bug makes co monitoring reflect status of prometheus pos/prometheus operator now, the co monitoring failed to reflect status of alermanager pod https://bugzilla.redhat.com/show_bug.cgi?id=2043518