Uploaded image for project: 'OpenShift Hive'
  1. OpenShift Hive
  2. HIVE-2543

Alert on excessive reconciles/$time/#spokes for longer than $duration

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • None
    • None
    • False
    • None
    • False

      Hive controllers should settle down to a low roar in steady state.

      When a config change happens, we expect a spike, which should last a relatively short time, then settle back to steady state.

      If we can figure out what "steady state" and such spikes look like, we should alert if a spike lasts longer than we expect. This can point to bugs in controllers, such as the MachinePool ownedLabels/ownedTaints thrash from ITN-2024-00101 / HIVE-2541

      Ideally this metric would be tracking the time between when a request is queued and when it is serviced. That's Hardâ„¢. But we should be able to track basic controller things like queue depth. A problem we saw was that these upstream controller metrics didn't seem to be available via hive!

              sumehta Suhani Mehta
              efried.openshift Eric Fried
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: