Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.14.0
Component/s: Monitoring
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
No

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

create PrometheusRule/pod in openshift-monitoring project to trigger the PodFailedToStart alert, note that there is not "\" before $labels

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: auto-test-rules
  namespace: openshift-monitoring
spec:
  groups:
    - name: alerting rules
      rules:
        - alert: PodFailedToStart
          annotations:
            description: Pod {{ $labels.namespace }}/{{ $labels.pod }} on node {{ $labels.node }} has been restarted for more than 1 times within one minute.
          expr: sum by(pod, namespace) (kube_pod_status_ready{condition="true",namespace="openshift-monitoring"}) * on(pod, namespace) group_right() kube_pod_info == 0
          labels:
            severity: critical
---
apiVersion: v1
kind: Pod
metadata:
  name: crash-pod
  namespace: openshift-monitoring
spec:
  containers:
    - name: crash-app
      image: quay.io/openshifttest/crashpod
      securityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
          - ALL
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  restartPolicy: Always

"$labels "are dropped from the created PrometheusRule

$ oc -n openshift-monitoring get prometheusrules auto-test-rules -oyaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  creationTimestamp: "2023-05-24T02:17:25Z"
  generation: 1
  name: auto-test-rules
  namespace: openshift-monitoring
  resourceVersion: "86918"
  uid: e601ed9b-553f-4ca0-ab41-197a4394714e
spec:
  groups:
  - name: alerting rules
    rules:
    - alert: PodFailedToStart
      annotations:
        description: Pod {{ .namespace }}/{{ .pod }} on node {{ .node }} has been
          restarted for more than 1 times within one minute.
      expr: sum by(pod, namespace) (kube_pod_status_ready{condition="true",namespace="openshift-monitoring"})
        * on(pod, namespace) group_right() kube_pod_info == 0
      labels:
        severity: critical

alert annotations.description is not correctly parsed, error: "<error expanding template: error executing template _alert_PodFailedToStart: template: __alert_PodFailedToStart:1:119: executing \"_alert_PodFailedToStart\" at <.namespace>: can't evaluate field namespace in type struct { Labels map[string]string; ExternalLabels map[string]string; ExternalURL string; Value float64 }>"

$ oc -n openshift-monitoring get pod crash-pod -o wide
NAME        READY   STATUS             RESTARTS        AGE   IP            NODE                                           NOMINATED NODE   READINESS GATES
crash-pod   0/1     CrashLoopBackOff   8 (4m47s ago)   20m   10.131.0.38   ip-10-0-143-17.ca-central-1.compute.internal   <none>           <none>

$ token=`oc create token prometheus-k8s -n openshift-monitoring`
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-main.openshift-monitoring.svc:9094/api/v2/alerts?&filter={alertname="PodFailedToStart"}' | jq
[
  {
    "annotations": {
      "description": "<error expanding template: error executing template __alert_PodFailedToStart: template: __alert_PodFailedToStart:1:119: executing \"__alert_PodFailedToStart\" at <.namespace>: can't evaluate field namespace in type struct { Labels map[string]string; ExternalLabels map[string]string; ExternalURL string; Value float64 }>"
    },
    "endsAt": "2023-05-24T02:40:50.117Z",
    "fingerprint": "5ea1ff8bb73f6c9b",
    "receivers": [
      {
        "name": "Critical"
      }
    ],
...

remove the created PrometheusRule, and add "\" before all "$labels" in annotations.description, create the PrometheusRule again , will find $labels is not dropped

$ oc -n openshift-monitoring get prometheusrules auto-test-rules -oyaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  creationTimestamp: "2023-05-24T03:09:39Z"
  generation: 1
  name: auto-test-rules
  namespace: openshift-monitoring
  resourceVersion: "104321"
  uid: 24f3aa2d-fc9f-46e6-a026-c7bb9e6471f7
spec:
  groups:
  - name: alerting rules
    rules:
    - alert: PodFailedToStart
      annotations:
        description: Pod {{ $labels.namespace }}/{{ $labels.pod }} on node {{ $labels.node
          }} has been restarted for more than 1 times within one minute.
      expr: sum by(pod, namespace) (kube_pod_status_ready{condition="true",namespace="openshift-monitoring"})
        * on(pod, namespace) group_right() kube_pod_info == 0
      labels:
        severity: critical

and annotations.description is correctly parsed

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-main.openshift-monitoring.svc:9094/api/v2/alerts?&filter={alertname="PodFailedToStart"}' | jq
[
  {
    "annotations": {
      "description": "Pod openshift-monitoring/crash-pod on node ip-10-0-143-17.ca-central-1.compute.internal has been restarted for more than 1 times within one minute."
    },
    "endsAt": "2023-05-24T03:09:50.117Z",
    "fingerprint": "5ea1ff8bb73f6c9b",
    "receivers": [
      {
        "name": "Critical"
      }
    ],
...

Version-Release number of selected component (if applicable):

$ oc versionClient
Version: 4.14.0-0.nightly-2023-05-23-103225
Kustomize Version: v4.5.7
Server Version: 4.14.0-0.nightly-2023-05-23-103225
Kubernetes Version: v1.27.1+38c64ac

How reproducible:

always

Steps to Reproduce:

1. see the description
2.
3.

Actual results:

need to add "\" before $labels in annotations.description of PrometheusRule, otherwise $labels would be dropped

Expected results:

Additional info:

it seems this is not a bug, if so, we can close it

Assignee:: Simon Pasquier

Reporter:: Junqi Zhao

Need Info From:: None

Contributors:: None

QA Contact:: Junqi Zhao

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2023/05/24 3:31 AM

Updated:: 2025/07/26 11:27 PM

Resolved:: 2023/05/24 6:32 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates