Uploaded image for project: 'OpenShift Monitoring'
  1. OpenShift Monitoring
  2. MON-2980

[consultancy] Additional OpenShift Console/ODC metrics+telemetry

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • None
    • False
    • None
    • False
    • NEW
    • Impediment
    • NEW
    • MON Sprint 235
    • 0

      We, the Developer console team (ODC) want to add multiple additional telemetry metrics for ODC in 4.13. We handle this as Epic ODC-7171.

      Quick note upfront: I wrote these questions also into this doc and will collect answers from this ticket I get on slack here:

      https://docs.google.com/document/d/1_YH_lrWuaas294KzZjfSMpB8Q_yO5LScj-DWDyPUkt0/edit

       

      I also started a documentation (WIP) about our console metrics here:

      https://docs.google.com/document/d/1PqbKv_-q2PW8mK3lwGEjpLwO5jdf9TojOchnjUY9YMU/edit

       

      Describe the issue, question or problem you have.

      The metrics we want to collect are all connected to the question is and how much is the console used. We tried to focus on these numbers here which we can not easily measure with our Segment telemetry (frontend analytics)

      1. Is the console installed or not (as the console-operator is optional starting with 4.12)
      2. How many users use the console (which kind of users: admins, devs)
      3. Number of logins (and which kind of user: admins, devs)
      4. Usage of the console (page views) and which perspective are they using (admin, developer, acm, others)
      5. Did the customer restrict or disable a perspective (for example disabled the admin perspective for normal developers)
      6. Which plugins did the customer install

      Question 1, related to metric 1 above)

      There are metrics to check if the console is installed (and running), should we open a request to add these metrics to the telemetry? Interesting existing metrics that can answer this question are:
      console_url
      cluster_operator_up{name="console"}
      cluster_version_capability{name="Console"}

      No need to create our own _up metric for this, right?

      Question 2, related to metrics 2-4 above)

      When we add a new console metric how many users are using the console, and label the user+metric with admin or developer, is it later possible to get this information "per cluster"?

      For example: When we add the new metric and later we (or you) add these metrics to the telemetry (insights-operator, right?). The job that pushes this information to the RH data hub is already adding a cluster id or something similar?

      So that we can see in DataHub or Tableau for example the cluster id (or name) with the most console-users (admins and developers), or most console-developers (a label)?

      Question 3, related to metrics 2-3 above)

      We don't expect that console-users per cluster will change really quickly. But because users could be removed I think we need to use a Gauge instead of a Counter, right?

      Is there a way to track this information less often? For example just once or twice a day instead of every 4m30s? Does this matters for you?

      Question 4, related to metrics 5-6)

      There are some configurations that a cluster-admin can make "per console" (=per cluster), and we would like to measure them. For example, if the cluster-admin installs a plugin "xyz" we want to add a counter console_plugin_installed{name=xzy} 1 - similar to the common *_up metrics.

      I understand that this is mostly about reducing cardinality, we don't expect many plugins here. Is this the way to go?

      General question

      When the console Pod is restarted (for whatever reason) new cardinality is added because the metric on the cluster saves also the Pod name and Service name.

      Is that an issue? Should we somehow (how?) remove the Pod and Servicename by creating a new Prometheus query for the metrics we've added?

       

      Is your team currently blocked?

      Not blocked, but it might be time critical (my fault) to get this in for 4.13.

      Describe the business criticality and importance of this request.

      To better understand our users and investments in the console its a high prio for us to get more insights into how many users we have and which features are used by our customers.

      What is your explicit request to the team?

      First, get consultancy with the questions above.

      I will open another ticket to add these new metrics to telemetry (insights-operator)

       

      (If you require an initial meeting, it should not exceed 30 minutes.)

      I would be thankful for a meeting, but I'm fine if you can answer all questions in this ticket, on slack or in the google doc I've linked at the beginning.

       

            spasquie@redhat.com Simon Pasquier
            cjerolim Christoph Jerolimov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: