Loading...

XML

Word

Printable

Type: Story
Resolution: Done
Priority: Undefined
Fix Version/s: openshift-4.10
Affects Version/s: None
Component/s: shared-resources
Labels:
- groomed
- no-doc

Work Type:
BU Product Work
Story Points:
5
Blocked:
False
Ready:
False
Epic Link:
Tech Preview Shared Resource CSI Driver
Feature Link:
OCPSTRAT-475 - Enable sharing ConfigMaps and Secrets across namespaces [Tech Preview]
Release Note Text:
Undefined
Market:

Sprint:
Sprint 207, Sprint 209, Sprint 210, Sprint 211, Sprint 212
sprint_count:
5

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

User Story

As an OpenShift cluster admin
I want to be able to monitor various statistics around the controller and csi volume driver provided by https://github.com/openshift/csi-driver-projected-resource
So that I can understand the csi driver's behavior

Acceptance Criteria

Cluster admins can monitor number of volume mounts that succeed and fail.
Cluster admins can see the number of shares on the cluster.
During development, we do some scale testing minimally validate the metrics are recorded correctly. Then, if we discover places the driver can be improved, we either do those improvements with this Jira if they are containable, or we open BZs to track.

Docs Impact

No impact to product documentation - metrics are not in product documentation. Upstream repository should have markdown documentation on what metrics exist, what they mean.

QE Impact

CI tests will need to verify that we are exposing the appropriate metric through integration and e2e testing.
QE can verify by querying the metric in the Monitoring/Prometheus console.

PX Impact

None.

Notes

Docs on Prometheus

Metric types: https://prometheus.io/docs/concepts/metric_types/
Naming conventions: https://prometheus.io/docs/practices/naming/

Shared resource CSI driver instances will need to expose relevant metrics to Prometheus.
Prometheus has libraries which make it easy to set up a http process that exposes the Prometheus metrics.
A Service needs to then expose the metric from the pods - see https://github.com/openshift/cluster-openshift-controller-manager-operator/blob/master/bindata/v3.11.0/openshift-controller-manager/svc.yaml
Monitoring team needs to be consulted.

Mount attempts that succeed or fail could be a counter. Probably one counter metric with two labels (can have more, ideally no more than 10 labels).
Number of shares on the cluster must be a gauge metric that is computed every time the prometheus endpoint is hit.

Exposing to Telemetry is out of scope - we can take this on in a separate story. See ~~BUILD-345~~

Open questions:

What are the metrics that a cluster admin would care about?

Number of mounts that succeed vs fail - in the future we can consider creating an alert.
Number of shares on the cluster - this impacts memory usage.
Number of unique Secrets and ConfigMaps that are shared - also impacts memory usage. Not sure if this is feasible with the current implementation.

What metrics do we get "out of the box" with Kubernetes/OpenShift?

The motivations for this story are similar to https://issues.redhat.com/browse/BUILD-246 in that both help understand

how performant, scalable, and healthy the driver is.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

image-2021-12-02-14-31-47-327.png
263 kB
2021/12/02 9:01 AM

blocks

BUILD-345 Expose CSI driver metrics to Telemetry

Closed

links to

openshift/csi-driver-shared-resource#54: Build 238: share metrics exposure, new monitoring objects

openshift/csi-driver-shared-resource#54: BUILD-238: share metrics exposure, new monitoring objects

openshift/csi-driver-shared-resource-operator#12: WIP: BUILD-238: metrics for total shares of config maps and secrets

Assignee:: Alice Rum (Inactive)

Reporter:: Gabe Montero

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2021/03/08 3:33 PM

Updated:: 2024/09/05 2:59 AM

Resolved:: 2022/01/11 1:18 PM

Details

Description

User Story

Acceptance Criteria

Docs Impact

QE Impact

PX Impact

Notes

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates