Uploaded image for project: 'Observability UI'
  1. Observability UI
  2. OU-124

KubePodNotScheduled alert description is not fully shown on admin console

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Undefined Undefined
    • None
    • OpenShift 4.12, OpenShift 4.13
    • Admin-Console
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      see from OCPBUGS-4431, KubePodNotScheduled is added to 4.12,

              - alert: KubePodNotScheduled
                annotations:
                  description: |-
                    Pod {{ $labels.namespace }}/{{ $labels.pod }} cannot be scheduled for more than 30 minutes.
                    Check the details of the pod with the following command:
                    oc describe -n {{ $labels.namespace }} pod {{ $labels.pod }}
                  summary: Pod cannot be scheduled.
                expr: last_over_time(kube_pod_status_unschedulable{namespace=~"(openshift-.*|kube-.*|default)"}[5m])
                  == 1
                for: 30m
                labels:
                  severity: warning

      tested on 4.12.0-0.nightly-2022-12-04-160656 with the following example to trigger the alert

      apiVersion: apps/v1
      kind: Deployment
      metadata:
        labels:
          app: prometheus-example-app
        name: prometheus-example-app
        namespace: default
      spec:
        replicas: 1
        selector:
          matchLabels:
            app: prometheus-example-app
        template:
          metadata:
            labels:
              app: prometheus-example-app
          spec:
            containers:
            - image: ghcr.io/rhobs/prometheus-example-app:0.4.0
              imagePullPolicy: IfNotPresent
              name: prometheus-example-app
              securityContext:
                allowPrivilegeEscalation: false
                capabilities:
                  drop:
                  - ALL
            nodeSelector:
              disktype: ssd
            securityContext:
              runAsNonRoot: true
              seccompProfile:
                type: RuntimeDefault

      checked from alertmanager API, annotations.description is right for KubePodNotScheduled alert

      # token=`oc create token prometheus-k8s -n openshift-monitoring`
      # oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-main.openshift-monitoring.svc:9094/api/v2/alerts' | jq
      ...
        {
          "annotations": {
            "description": "Pod default/prometheus-example-app-766ddd6bc4-5s4j2 cannot be scheduled for more than 30 minutes.\nCheck the details of the pod with the following command:\noc describe -n default pod prometheus-example-app-766ddd6bc4-5s4j2",
            "summary": "Pod cannot be scheduled."
          },
          "endsAt": "2022-12-05T03:27:20.407Z",
          "fingerprint": "3e90a7d55af10e4e",
          "receivers": [
            {
              "name": "Default"
            }
          ],
          "startsAt": "2022-12-05T03:20:20.407Z",
          "status": {
            "inhibitedBy": [],
            "silencedBy": [],
            "state": "active"
          },
          "updatedAt": "2022-12-05T03:23:20.486Z",
          "generatorURL": "https:/console-openshift-console.apps.qe-ui412-1205.qe.devcluster.openshift.com/monitoring/graph?g0.expr=last_over_time%28kube_pod_status_unschedulable%7Bnamespace%3D~%22%28openshift-.%2A%7Ckube-.%2A%7Cdefault%29%22%7D%5B5m%5D%29+%3D%3D+1&g0.tab=1",
          "labels": {
            "alertname": "KubePodNotScheduled",
            "container": "kube-rbac-proxy-main",
            "endpoint": "https-main",
            "job": "kube-state-metrics",
            "namespace": "default",
            "openshift_io_alert_source": "platform",
            "pod": "prometheus-example-app-766ddd6bc4-5s4j2",
            "prometheus": "openshift-monitoring/k8s",
            "service": "kube-state-metrics",
            "severity": "warning",
            "uid": "f1da7d94-6451-4916-a476-d8ad7c3768ab"
          } 

      but from admin console, the Description part missed "default pod prometheus-example-app-766ddd6bc4-5s4j2" after

      Check the details of the pod with the following command: oc describe -n

              anpicker@redhat.com Andrew Pickering
              juzhao@redhat.com Junqi Zhao
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: