• Icon: Spike Spike
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • None
    • None
    • None

      Which alerting rules would be useful to have for COO? One obvious rule is to alert on failed reconciliations (controller_runtime_reconcile_errors_total & controller_runtime_reconcile_total) but we should not limit ourselves to the out-of-the-box metrics provided by the controller-runtime library.

            [COO-484] Define the alerting strategy for COO

            See COO-485 for the implementation

            Simon Pasquier added a comment - See COO-485 for the implementation

            Simon Pasquier added a comment - cc jfajersk@redhat.com  

            Simon Pasquier added a comment - - edited

            For now I'd recommend to alert on failed reconciliation loops using the following expression:

            sum by(controller,namespace) (rate(controller_runtime_reconcile_total{result="error",job="observability-operator",namespace="coo"}[5m]))
            /
            sum by(controller,namespace) (rate(controller_runtime_reconcile_total{job="observability-operator",namespace="coo"}[5m])) 

            It should be sufficient for GA.

            Simon Pasquier added a comment - - edited For now I'd recommend to alert on failed reconciliation loops using the following expression: sum by(controller,namespace) (rate(controller_runtime_reconcile_total{result="error",job="observability-operator",namespace="coo"}[5m])) / sum by(controller,namespace) (rate(controller_runtime_reconcile_total{job="observability-operator",namespace="coo"}[5m]))  It should be sufficient for GA.

              spasquie@redhat.com Simon Pasquier
              spasquie@redhat.com Simon Pasquier
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: