-
Bug
-
Resolution: Won't Do
-
Normal
-
None
-
4.12
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Better message in the CMO degraded/unavailable conditions when alertmanager pods can't be scheduled
Version-Release number of selected component (if applicable):
4.12.0-0.nightly-2022-08-24-053339
How reproducible:
always
Steps to Reproduce:
1. Configure alertmanager with an invalid volume claim template (unknown storage class for instance):
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
alertmanagerMain:
volumeClaimTemplate:
metadata:
name: monitorpvc
spec:
storageClassName: foo
volumeMode: Filesystem
resources:
requests:
storage: 1Gi
2. Wait for CMO to go degraded
Actual results:
% oc -n openshift-monitoring describe pod alertmanager-main-0 |tail -n 10
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 13m default-scheduler 0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
Warning FailedScheduling 13m default-scheduler 0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
% oc get co monitoring -o jsonpath='{.status.conditions}' | jq 'map(select(.type=="Degraded" or .type=="Available"))'
[
{
"lastTransitionTime": "2022-08-25T08:51:55Z",
"reason": "AsExpected",
"status": "True",
"type": "Available"
},
{
"lastTransitionTime": "2022-08-25T08:51:55Z",
"message": "waiting for Alertmanager object changes failed: waiting for Alertmanager openshift-monitoring/main: expected 2 replicas, got 0 updated replicas",
"reason": "UpdatingAlertmanagerFailed",
"status": "True",
"type": "Degraded"
}
]
Expected results:
CMO should surface a better explanation as to why the pods aren't in the desired state. The available status should be false
Additional info:
The fix of the following bug makes co monitoring reflect status of prometheus pos/prometheus operator now, the co monitoring failed to reflect status of alermanager pod https://bugzilla.redhat.com/show_bug.cgi?id=2043518