Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-13987

need to add "\" before $labels in annotations.description of PrometheusRule, otherwise $labels would be dropped

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Undefined
    • None
    • 4.14.0
    • Monitoring
    • None
    • No
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      create PrometheusRule/pod in openshift-monitoring project to trigger the PodFailedToStart alert, note that there is not "\" before $labels

      apiVersion: monitoring.coreos.com/v1
      kind: PrometheusRule
      metadata:
        name: auto-test-rules
        namespace: openshift-monitoring
      spec:
        groups:
          - name: alerting rules
            rules:
              - alert: PodFailedToStart
                annotations:
                  description: Pod {{ $labels.namespace }}/{{ $labels.pod }} on node {{ $labels.node }} has been restarted for more than 1 times within one minute.
                expr: sum by(pod, namespace) (kube_pod_status_ready{condition="true",namespace="openshift-monitoring"}) * on(pod, namespace) group_right() kube_pod_info == 0
                labels:
                  severity: critical
      ---
      apiVersion: v1
      kind: Pod
      metadata:
        name: crash-pod
        namespace: openshift-monitoring
      spec:
        containers:
          - name: crash-app
            image: quay.io/openshifttest/crashpod
            securityContext:
              allowPrivilegeEscalation: false
              capabilities:
                drop:
                - ALL
        securityContext:
          runAsNonRoot: true
          seccompProfile:
            type: RuntimeDefault
        restartPolicy: Always

      "$labels "are dropped from the created PrometheusRule

      $ oc -n openshift-monitoring get prometheusrules auto-test-rules -oyaml
      apiVersion: monitoring.coreos.com/v1
      kind: PrometheusRule
      metadata:
        creationTimestamp: "2023-05-24T02:17:25Z"
        generation: 1
        name: auto-test-rules
        namespace: openshift-monitoring
        resourceVersion: "86918"
        uid: e601ed9b-553f-4ca0-ab41-197a4394714e
      spec:
        groups:
        - name: alerting rules
          rules:
          - alert: PodFailedToStart
            annotations:
              description: Pod {{ .namespace }}/{{ .pod }} on node {{ .node }} has been
                restarted for more than 1 times within one minute.
            expr: sum by(pod, namespace) (kube_pod_status_ready{condition="true",namespace="openshift-monitoring"})
              * on(pod, namespace) group_right() kube_pod_info == 0
            labels:
              severity: critical 

      alert annotations.description is not correctly parsed, error: "<error expanding template: error executing template _alert_PodFailedToStart: template: __alert_PodFailedToStart:1:119: executing \"_alert_PodFailedToStart\" at <.namespace>: can't evaluate field namespace in type struct { Labels map[string]string; ExternalLabels map[string]string; ExternalURL string; Value float64 }>"

      $ oc -n openshift-monitoring get pod crash-pod -o wide
      NAME        READY   STATUS             RESTARTS        AGE   IP            NODE                                           NOMINATED NODE   READINESS GATES
      crash-pod   0/1     CrashLoopBackOff   8 (4m47s ago)   20m   10.131.0.38   ip-10-0-143-17.ca-central-1.compute.internal   <none>           <none>
      
      $ token=`oc create token prometheus-k8s -n openshift-monitoring`
      $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-main.openshift-monitoring.svc:9094/api/v2/alerts?&filter={alertname="PodFailedToStart"}' | jq
      [
        {
          "annotations": {
            "description": "<error expanding template: error executing template __alert_PodFailedToStart: template: __alert_PodFailedToStart:1:119: executing \"__alert_PodFailedToStart\" at <.namespace>: can't evaluate field namespace in type struct { Labels map[string]string; ExternalLabels map[string]string; ExternalURL string; Value float64 }>"
          },
          "endsAt": "2023-05-24T02:40:50.117Z",
          "fingerprint": "5ea1ff8bb73f6c9b",
          "receivers": [
            {
              "name": "Critical"
            }
          ],
      ...

      remove the created PrometheusRule, and add "\" before all "$labels"  in annotations.description, create the PrometheusRule again , will find $labels is not dropped

      $ oc -n openshift-monitoring get prometheusrules auto-test-rules -oyaml
      apiVersion: monitoring.coreos.com/v1
      kind: PrometheusRule
      metadata:
        creationTimestamp: "2023-05-24T03:09:39Z"
        generation: 1
        name: auto-test-rules
        namespace: openshift-monitoring
        resourceVersion: "104321"
        uid: 24f3aa2d-fc9f-46e6-a026-c7bb9e6471f7
      spec:
        groups:
        - name: alerting rules
          rules:
          - alert: PodFailedToStart
            annotations:
              description: Pod {{ $labels.namespace }}/{{ $labels.pod }} on node {{ $labels.node
                }} has been restarted for more than 1 times within one minute.
            expr: sum by(pod, namespace) (kube_pod_status_ready{condition="true",namespace="openshift-monitoring"})
              * on(pod, namespace) group_right() kube_pod_info == 0
            labels:
              severity: critical

      and annotations.description is correctly parsed

      $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-main.openshift-monitoring.svc:9094/api/v2/alerts?&filter={alertname="PodFailedToStart"}' | jq
      [
        {
          "annotations": {
            "description": "Pod openshift-monitoring/crash-pod on node ip-10-0-143-17.ca-central-1.compute.internal has been restarted for more than 1 times within one minute."
          },
          "endsAt": "2023-05-24T03:09:50.117Z",
          "fingerprint": "5ea1ff8bb73f6c9b",
          "receivers": [
            {
              "name": "Critical"
            }
          ],
      ...

      Version-Release number of selected component (if applicable):

      $ oc versionClient
      Version: 4.14.0-0.nightly-2023-05-23-103225
      Kustomize Version: v4.5.7
      Server Version: 4.14.0-0.nightly-2023-05-23-103225
      Kubernetes Version: v1.27.1+38c64ac
      

      How reproducible:

      always

      Steps to Reproduce:

      1. see the description
      2.
      3.
      

      Actual results:

      need to add "\" before $labels in annotations.description of PrometheusRule, otherwise $labels would be dropped

      Expected results:

       

      Additional info:

      it seems this is not a bug, if so, we can close it

      Attachments

        Activity

          People

            spasquie@redhat.com Simon Pasquier
            juzhao@redhat.com Junqi Zhao
            Junqi Zhao Junqi Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: