-
Epic
-
Resolution: Done
-
Normal
-
None
-
Kuryr: Monitor status of OpenStack resources
-
False
-
False
-
Done
-
0% To Do, 0% In Progress, 100% Done
-
Undefined
Goal
As an operator running OCP with Kuryr I'd like to have metrics and alerts regarding important Kuryr resources - e.g. if K8s API or DNS LB goes into ERROR status or some if it's members cannot be created.
Problem
Currently there's no way to monitor the OpenStack resources created by Kuryr other than querying OpenStack APIs ourselves. We'd need to identify viable metrics and implement utility that would calculate and expose them, as well as alerts in case anything is troubling.
Why is this important
We had an incident when the OCP cluster was considered healthy even though the OpenShift API LB was missing 2 members. If this was signaled to the user, they would be able to avoid downtime and escalation.
Estimate (XS, S, M, L, XL, XXL): M
There are no Sub-Tasks for this issue.