Uploaded image for project: 'Red Hat Advanced Cluster Security'
  1. Red Hat Advanced Cluster Security
  2. ROX-29370

ACSCS reduce impact of temporary router issues on burn rate.

    • Icon: Task Task
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • None
    • ACS Cloud Service
    • Future Sustainability
    • False
    • Hide

      None

      Show
      None
    • False

      Description

      This ticket was created as a follow up for itn-2025-00118

      The Prometheus HTTP Blackbox probe results affect the Error burn rate very much.

      During above incident the probes sometimes failed because of the load on the default router with a 502/503. This has led to a 200% Error Budget burn overall, though the HTTP health was flapping, and UI for other tenants was mostly working.

      We should consider changing the calculation so that the burn rate and error budget are less affected by a temporary failure of the healthcheck.

              Unassigned Unassigned
              rh-ee-jmalsam Johannes Malsam
              ACS Cloud Service
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: