Uploaded image for project: 'OpenShift Hive'
  1. OpenShift Hive
  2. HIVE-2088

Clean up our monitoring story

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Minor Minor
    • None
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None
    • None
    • None

      HIVE-1581 tracked some work to have hive set up metrics configuration. Specifically, via #1525, setting HiveConfig.Spec.ExportMetrics=true would cause hive-operator to do the following to the TargetNamespace:

      • Add the openshift.io/cluster-monitoring: "true" label
      • Create ServiceMonitors for hive-controllers and hive-clustersync
      • Create a Role and RoleBinding allowing the cluster's prometheus instances (both the cluster and user monitoring ones) to read Services, Endpoints and Pods

      HIVE-1655 resulted from testing the above, complaining that this configuration was not undone when ExportMetrics was removed or set to false. #1577 made it so.

      Couple issues here:

      • The monitoring infrastructure has matured since then. It is not clear that the configuration laid down when ExportMetrics=true conforms to current recommended/supported best practices. For instance, I think the cluster-monitoring label will cause the metrics to be scraped by the cluster monitoring stack as opposed to the user one. This is (probably? usually?) not appropriate for an operator installed via OLM – which according to HIVE-1581 is what this was meant to help with.
      • Consumers attempting to set things up manually (without ExportMetrics) can unexpectedly have their configuration removed. This happened in real life when Openshift CI tried to create ServiceMonitors with the same name as the ones hive manages. Fortunately, they were able to work around the issue by a) renaming their ServiceMonitors; and b) using user workload monitoring, which uses the openshift.io/workload-monitoring: "true" label rather than the cluster-monitoring one.

      This card is to crisp up our story around ExportMetrics. What scenarios is it supposed to be used for? Is it (still) necessary/desirable to provide something in this space? If so, let's clean it up so it doesn't interfere with manual setup. If not, let's deprecate and disable it. And either way, let's make docs/monitoring.md reflect reality.

      cc hongkliu sumehta 

              Unassigned Unassigned
              efried.openshift Eric Fried
              None
              None
              Feilian Xie Feilian Xie (Inactive)
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: