Uploaded image for project: 'Machine Config Operator'
  1. Machine Config Operator
  2. MCO-427

Add missing runbooks for Prometheus rules

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • None
    • False
    • OCPSTRAT-554 - Improving error handling, propagation, collection, and disambiguation for users
    • 0
    • 0

      Having a look at our Prometheus rules and comparing them to the runbook entries, it seems we have a few missing entries. Using yq and comparing the entries I've found to what's in the runbook repo, the following Prometheus rules are missing runbook entries:

      • ExtremelyHighIndividualControlPlaneMemory
      • HighOverallControlPlaneMemory
      • KubeletHealthState
      • MCDPivotError
      • MCDRebootError
      • SystemMemoryExceedsReservation

       

      Done When:

      • Each of the following Prometheus rules has an entry in the runbook repository.

       

      Notes:

      • I've excluded MCDDrainError (as MCCDrainError) because that effort is being tracked here: https://issues.redhat.com/browse/MCO-88.
      • I've also excluded MCDRebootError since there is additional investigative work that needs to be done. This is being tracked in https://issues.redhat.com/browse/MCO-203.
      • The yq command I used to generate the above list is: $ yq eval-all '.spec[][].rules[].alert' ./install/0000_90_machine-config-operator_01_prometheus-rules.yaml | sort | uniq

            Unassigned Unassigned
            zzlotnik@redhat.com Zack Zlotnik
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: