-
Story
-
Resolution: Done
-
Critical
-
None
-
None
-
None
-
False
-
None
-
False
-
-
Hive controllers should settle down to a low roar in steady state.
When a config change happens, we expect a spike, which should last a relatively short time, then settle back to steady state.
If we can figure out what "steady state" and such spikes look like, we should alert if a spike lasts longer than we expect. This can point to bugs in controllers, such as the MachinePool ownedLabels/ownedTaints thrash from ITN-2024-00101 / HIVE-2541
Ideally this metric would be tracking the time between when a request is queued and when it is serviced. That's Hardâ„¢. But we should be able to track basic controller things like queue depth. A problem we saw was that these upstream controller metrics didn't seem to be available via hive!