-
Story
-
Resolution: Done
-
Major
-
None
-
None
-
None
-
1
User Story
As an OpenShift engineer and/or Project Manager
I want to see measures of build success and failure by strategy
So that I can analyze if customers are successfully using builds.
As a cluster admin or developer manager
I want to see measures of build success and failure by strategy
So that I can analyze if my developers are successfully using builds.
Acceptance Criteria
- Builds have a Counter metric that measures the results of finished builds live.
- Build results can be sliced by build strategy and final phase ("Completed", "Failed", "Error", etc.)
- Build results can be sliced by namespace and BuildConfig name.
Launch Checklist
Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated
Notes
Openshift clusters have multiple published metrics for builds. These use Gauge metrics which query all builds to gather values. These metrics are problematic because builds by default only most recent 5 successful and failed builds are retained. Over time any success/failure metric will trend towards a 1:1 ratio.
Counters are useful to capture rates, increases, and other operations.
The build controller has a function which updates builds upon completion. This would be an ideal location to add increments to the metric.
There was previous discussion with Ben, Clayton, and others regarding the current metrics that are reported. There may be additional discussion if these metrics are correct.
Monitoring team will also need to sign off on this.
In the future that we may want to add a label that shows the builder image for Source and
Open questions:
1. Do we need to initialize this measure on upgrade? - this may not even be possible.
2. Will this create problems with the monitoring stack? What happens if Prometheus retains longer history?
Guiding Questions
User Story
- Is this intended for an administrator, application developer, or other type of OpenShift user?
- What experience level is this intended for? New, experienced, etc.?
- Why is this story important? What problems does this solve? What benefit(s) will the customer experience?
- Is this part of a larger epic or initiative? If so, ensure that the story is linked to the appropriate epic and/or initiative.
Acceptance Criteria
- How should a customer use and/or configure the feature?
- Are there any prerequisites for using/enabling the feature?
Notes
- Is this a new feature, or an enhancement of an existing feature? If the latter, list the feature and docs reference.
- Are there any new terms, abbreviations, or commands introduced with this story? Ex: a new command line argument, a new custom resource.
- Are there any recommended best practices when using this feature?
- On feature completion, are there any known issues that customers should be aware of?
- blocks
-
OCPBUILD-94 Build v1 - Send build result metrics to Telemetry
- Closed
- is blocked by
-
OCPBUILD-92 [builds] R&D Report Failure Reasons as Conditions
- Closed
-
OCPBUILD-116 Detailed Conditions for Builds
- Closed