Details
-
Bug
-
Resolution: Not a Bug
-
Undefined
-
None
-
4.14.0
-
None
-
No
-
False
-
Description
Description of problem:
create PrometheusRule/pod in openshift-monitoring project to trigger the PodFailedToStart alert, note that there is not "\" before $labels
apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: auto-test-rules namespace: openshift-monitoring spec: groups: - name: alerting rules rules: - alert: PodFailedToStart annotations: description: Pod {{ $labels.namespace }}/{{ $labels.pod }} on node {{ $labels.node }} has been restarted for more than 1 times within one minute. expr: sum by(pod, namespace) (kube_pod_status_ready{condition="true",namespace="openshift-monitoring"}) * on(pod, namespace) group_right() kube_pod_info == 0 labels: severity: critical --- apiVersion: v1 kind: Pod metadata: name: crash-pod namespace: openshift-monitoring spec: containers: - name: crash-app image: quay.io/openshifttest/crashpod securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault restartPolicy: Always
"$labels "are dropped from the created PrometheusRule
$ oc -n openshift-monitoring get prometheusrules auto-test-rules -oyaml apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: creationTimestamp: "2023-05-24T02:17:25Z" generation: 1 name: auto-test-rules namespace: openshift-monitoring resourceVersion: "86918" uid: e601ed9b-553f-4ca0-ab41-197a4394714e spec: groups: - name: alerting rules rules: - alert: PodFailedToStart annotations: description: Pod {{ .namespace }}/{{ .pod }} on node {{ .node }} has been restarted for more than 1 times within one minute. expr: sum by(pod, namespace) (kube_pod_status_ready{condition="true",namespace="openshift-monitoring"}) * on(pod, namespace) group_right() kube_pod_info == 0 labels: severity: critical
alert annotations.description is not correctly parsed, error: "<error expanding template: error executing template _alert_PodFailedToStart: template: __alert_PodFailedToStart:1:119: executing \"_alert_PodFailedToStart\" at <.namespace>: can't evaluate field namespace in type struct { Labels map[string]string; ExternalLabels map[string]string; ExternalURL string; Value float64 }>"
$ oc -n openshift-monitoring get pod crash-pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES crash-pod 0/1 CrashLoopBackOff 8 (4m47s ago) 20m 10.131.0.38 ip-10-0-143-17.ca-central-1.compute.internal <none> <none> $ token=`oc create token prometheus-k8s -n openshift-monitoring` $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-main.openshift-monitoring.svc:9094/api/v2/alerts?&filter={alertname="PodFailedToStart"}' | jq [ { "annotations": { "description": "<error expanding template: error executing template __alert_PodFailedToStart: template: __alert_PodFailedToStart:1:119: executing \"__alert_PodFailedToStart\" at <.namespace>: can't evaluate field namespace in type struct { Labels map[string]string; ExternalLabels map[string]string; ExternalURL string; Value float64 }>" }, "endsAt": "2023-05-24T02:40:50.117Z", "fingerprint": "5ea1ff8bb73f6c9b", "receivers": [ { "name": "Critical" } ], ...
remove the created PrometheusRule, and add "\" before all "$labels" in annotations.description, create the PrometheusRule again , will find $labels is not dropped
$ oc -n openshift-monitoring get prometheusrules auto-test-rules -oyaml apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: creationTimestamp: "2023-05-24T03:09:39Z" generation: 1 name: auto-test-rules namespace: openshift-monitoring resourceVersion: "104321" uid: 24f3aa2d-fc9f-46e6-a026-c7bb9e6471f7 spec: groups: - name: alerting rules rules: - alert: PodFailedToStart annotations: description: Pod {{ $labels.namespace }}/{{ $labels.pod }} on node {{ $labels.node }} has been restarted for more than 1 times within one minute. expr: sum by(pod, namespace) (kube_pod_status_ready{condition="true",namespace="openshift-monitoring"}) * on(pod, namespace) group_right() kube_pod_info == 0 labels: severity: critical
and annotations.description is correctly parsed
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-main.openshift-monitoring.svc:9094/api/v2/alerts?&filter={alertname="PodFailedToStart"}' | jq [ { "annotations": { "description": "Pod openshift-monitoring/crash-pod on node ip-10-0-143-17.ca-central-1.compute.internal has been restarted for more than 1 times within one minute." }, "endsAt": "2023-05-24T03:09:50.117Z", "fingerprint": "5ea1ff8bb73f6c9b", "receivers": [ { "name": "Critical" } ], ...
Version-Release number of selected component (if applicable):
$ oc versionClient Version: 4.14.0-0.nightly-2023-05-23-103225 Kustomize Version: v4.5.7 Server Version: 4.14.0-0.nightly-2023-05-23-103225 Kubernetes Version: v1.27.1+38c64ac
How reproducible:
always
Steps to Reproduce:
1. see the description 2. 3.
Actual results:
need to add "\" before $labels in annotations.description of PrometheusRule, otherwise $labels would be dropped
Expected results:
Additional info:
it seems this is not a bug, if so, we can close it