We currently offer SLOs for Pulls (99.9%) and Pushes (99.5%). The SLOs are calculated based on CatchPoint data derived from 100s of actions. This is insufficient to get a true sense of the availability of the service. We need to identify a way to base this off of metrics from Nginx or the ALB. Please coordinate with AppSRE on the metrics and the alerting because they hold the pager and will be woken up first during an SLO breach. The grafana graphs will need to be updated to calculate based on the new metrics.
- links to
- mentioned on
(2 links to, 17 mentioned on)