XMLWordPrintable

    • 5
    • False
    • False
    • Hide
      - SLO/SLI/Alerts can be seen using Promlens: https://promlens.stage.devshift.net/ by using PromQL
      - Simulate a failure condition (what should be failed will depend on the SLO/SLI definition) and make sure that it gets reflected in the SLO/SLI value accurately.
      - Simulate failure conditions and make sure alerts pick them up.

      Show
      - SLO/SLI/Alerts can be seen using Promlens: https://promlens.stage.devshift.net/ by using PromQL - Simulate a failure condition (what should be failed will depend on the SLO/SLI definition) and make sure that it gets reflected in the SLO/SLI value accurately. - Simulate failure conditions and make sure alerts pick them up.
    • Observability Sprint 2023-04
    • No

      Value:

      For Managed Services deployment of Hypershift clusters, we need to create SLO/SLI and alerts for Hypershift addon agent because it is in the critical path hosted cluster creation.

      Hypershift add-on manager metrics here
      https://github.com/stolostron/hypershift-addon-operator/blob/main/docs/advanced/prometheus_metrics.md

      if mce_hs_addon_install_in_progress_bool=0 and (mce_hs_addon_hypershift_operator_degraded_bool=1 or mce_hs_addon_ext_dns_operator_degraded_bool=1), hypershift operator is not available

      From rokejungrh 
      With the following count metrics, we should get a rate of failure in 10 minutes and generate alert of the rate goes over a set threshold rate
             * mce_hs_addon_placement_score_failure_count

      • mce_hs_addon_cluster_claims_failure_count
      • mce_hs_addon_hub_sync_failure_count
      • mce_hs_addon_kubeconfig_secret_copy_failure_count

       

      Josh & Roke created the SLO dashboard here:
      https://grafana.stage.devshift.net/d/87f7f256a3506f65da8694b290e8d8e4/acm-hypershift-addon[…]-cluster=&var-datasource=hypershift-observatorium-stage

      The dashboard has no data pending monitoring stack setup needed to send metrics to RHOBS.

      Definition of Done for Engineering Story Owner (Checklist)

      • ...

      Development Complete

              dbennett@redhat.com Disaiah Bennett
              jbanerje@redhat.com Joydeep Banerjee
              Xiang Yin Xiang Yin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: