Loading...

XML

Word

Printable

Type: Task
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- consultancy

Blocked:
False
Blocked Reason:
None
Ready:
False
Docs QE Status:
NEW
Flagged:

Impediment
QE Status:
NEW
Intelligence Requested:
Market:

Sprint:
MON Sprint 235

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

We, the Developer console team (ODC) want to add multiple additional telemetry metrics for ODC in 4.13. We handle this as Epic ~~ODC-7171~~.

Quick note upfront: I wrote these questions also into this doc and will collect answers from this ticket I get on slack here:

https://docs.google.com/document/d/1_YH_lrWuaas294KzZjfSMpB8Q_yO5LScj-DWDyPUkt0/edit

I also started a documentation (WIP) about our console metrics here:

https://docs.google.com/document/d/1PqbKv_-q2PW8mK3lwGEjpLwO5jdf9TojOchnjUY9YMU/edit

Describe the issue, question or problem you have.

The metrics we want to collect are all connected to the question is and how much is the console used. We tried to focus on these numbers here which we can not easily measure with our Segment telemetry (frontend analytics)

Is the console installed or not (as the console-operator is optional starting with 4.12)
How many users use the console (which kind of users: admins, devs)
Number of logins (and which kind of user: admins, devs)
Usage of the console (page views) and which perspective are they using (admin, developer, acm, others)
Did the customer restrict or disable a perspective (for example disabled the admin perspective for normal developers)
Which plugins did the customer install

Question 1, related to metric 1 above)

There are metrics to check if the console is installed (and running), should we open a request to add these metrics to the telemetry? Interesting existing metrics that can answer this question are:
console_url
cluster_operator_up{name="console"}
cluster_version_capability{name="Console"}

No need to create our own _up metric for this, right?

Question 2, related to metrics 2-4 above)

When we add a new console metric how many users are using the console, and label the user+metric with admin or developer, is it later possible to get this information "per cluster"?

For example: When we add the new metric and later we (or you) add these metrics to the telemetry (insights-operator, right?). The job that pushes this information to the RH data hub is already adding a cluster id or something similar?

So that we can see in DataHub or Tableau for example the cluster id (or name) with the most console-users (admins and developers), or most console-developers (a label)?

Question 3, related to metrics 2-3 above)

We don't expect that console-users per cluster will change really quickly. But because users could be removed I think we need to use a Gauge instead of a Counter, right?

Is there a way to track this information less often? For example just once or twice a day instead of every 4m30s? Does this matters for you?

Question 4, related to metrics 5-6)

There are some configurations that a cluster-admin can make "per console" (=per cluster), and we would like to measure them. For example, if the cluster-admin installs a plugin "xyz" we want to add a counter console_plugin_installed{name=xzy} 1 - similar to the common *_up metrics.

I understand that this is mostly about reducing cardinality, we don't expect many plugins here. Is this the way to go?

General question

When the console Pod is restarted (for whatever reason) new cardinality is added because the metric on the cluster saves also the Pod name and Service name.

Is that an issue? Should we somehow (how?) remove the Pod and Servicename by creating a new Prometheus query for the metrics we've added?

Is your team currently blocked?

Not blocked, but it might be time critical (my fault) to get this in for 4.13.

Describe the business criticality and importance of this request.

To better understand our users and investments in the console its a high prio for us to get more insights into how many users we have and which features are used by our customers.

What is your explicit request to the team?

First, get consultancy with the questions above.

I will open another ticket to add these new metrics to telemetry (insights-operator)

(If you require an initial meeting, it should not exceed 30 minutes.)

I would be thankful for a meeting, but I'm fine if you can answer all questions in this ticket, on slack or in the google doc I've linked at the beginning.

is related to

ODC-7171 [4.13] Improved telemetry (provide new metrics via insight)

Closed

is triggered by

OCPBUGS-12903 Add new console metrics to cluster-monitoring-operator telemetry configuration (4.14)

Closed

links to

CMO pull request

Assignee:: Simon Pasquier

Reporter:: Christoph Jerolimov

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2023/02/14 4:29 PM

Updated:: 2023/05/09 7:14 AM

Resolved:: 2023/05/09 7:13 AM

Details

Description

Describe the issue, question or problem you have.

Question 1, related to metric 1 above)

Question 2, related to metrics 2-4 above)

Question 3, related to metrics 2-3 above)

Question 4, related to metrics 5-6)

General question

Is your team currently blocked?

Describe the business criticality and importance of this request.

What is your explicit request to the team?

(If you require an initial meeting, it should not exceed 30 minutes.)

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates