Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-4697

KubePodNotReady and other default alerts for user projects

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • Node
    • False
    • None
    • False
    • Not Selected

      1. Proposed title of this feature request
      Default/recommended alerts for user projects

      2. What is the nature and description of the request?
      When creating custom alerts for user projects, it may be useful to be able to automatically create some alerts based on common use cases. These could be inspired by existing infrastructure alerts; for example, KubePodNotReady and KubePodCrashLooping.

      Perhaps it can be enabled on a per-namespace basis, to allow developers/admins to get alerts from important projects but not from demo/test projects, etc.

      Currently, this can be done manually by the users by creating a PrometheusRule for each project; for example:

      apiVersion: monitoring.coreos.com/v1
      kind: PrometheusRule
      metadata:
        name: example-alert-2
        namespace: example-namespace
      spec:
        groups:
        - name: example
          rules:
          - alert: PodNotReadyAlert
            expr: group by (phase) (kube_pod_status_phase{job="kube-state-metrics",namespace="example-namespace",phase=~"Pending|Unknown|Failed|Error|CrashLoopBackOff"}) > 0
      

      Allowing this to be automated could ensure that customer clusters have better reliability (and dont need to figure out their own PromQL for logic we've already solved once before).

      If this is not a good RFE candidate, it may instead be an example alert we can put into the documentation instead.

      3. Why does the customer need this? (List the business requirements here)
      Improve reliability of applications by more easily detecting when pods are in NotReady state.

      4. List any affected packages or components.
      Alertmanager

            gausingh@redhat.com Gaurav Singh
            rhn-support-stwalter Steven Walter
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: