-
Story
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
3
-
False
-
-
False
-
Unset
-
No
-
-
The importance of the notifications-gw component is increasing with OCM onboarding. The current set of SLOs does not seem to include notifications-gw https://gitlab.cee.redhat.com/service/app-interface/-/blob/master/data/services/insights/notifications/slo-documents/notifications.yml?ref_type=heads
We should ensure SLOs are defined for this component.
- Metrics to include in SLOs
As Josef mentioned in the following Slack thread:
Hello, firstly, there seems to be a mismatch between https://gitlab.cee.redhat.com/core-platform-apps/notifications-docs/-/blob/master/modules/SLO-document/pages/SLO-definitions.adoc and https://gitlab.cee.redhat.com/service/app-interface/-/blob/master/data/services/insights/notifications/slo-documents/notifications.yml?ref_type=heads
The former defines availability using the up() metric (i.e. are any pods running?) while the latter defines availability as the proportion of non-500 responses in the public API (excluding notifications-gw) and sets the goal for <1%. Both measure latency of the public APIs only (i.e. excluding notifications-gw).
As for the alerts, the only alert that touches notifications-gw seems to be the one you linked (i.e. an alert will fire if all the pods dissapear). There seems to be no alert for when notifications-gw keeps returning non-2xx responses or takes forever to respond.
mbarcina@redhat.com to add details from previous Slack
- mentioned on