-
Spike
-
Resolution: Won't Do
-
Undefined
-
None
-
None
-
None
-
False
-
None
-
False
during the investigation and discussion of Bug 2104511 , there was some discussion about possible monitoring or alerting around the notion of expected replicas versus observed replicas for a MachineSet.
this investigation should examine the possibility of exporting metrics based on the replicas that a MachineSet has currently and the number of Machines that actually exist. using these metrics we can start to create profiles about the average times and behaviors of scaling operations.
another perspective on this is creating alerting around situations where the observed replicas are taking a long time to reach the expected counts. although we have errors conditions for Machines with no running phases and Machines with no Nodes, this alert could detect conditions where a Machine object is never created.
For reference about this issue please read this thread https://coreos.slack.com/archives/CBZHF4DHC/p1660837393467059
- relates to
-
OCPCLOUD-1704 RFE: Alert on consistent ScaleUpTimedOut
- To Do
-
OCPCLOUD-1660 Improve error conditions for MachineSet failing to create new Machines
- Closed