-
Story
-
Resolution: Done
-
Normal
-
None
-
None
-
5
-
False
-
None
-
False
-
KONFLUX-123 - Konflux Availability SLO phase 1
-
Release Note Not Required
-
-
-
Pipelines Sprint Pioneers 10
Story (Required)
As a maintainer of Konflux trying to montior tekton health I want to know when tekton results if deadlocked or suffering from sufficient performance degradation.
<Describes high level purpose and goal for this story. Answers the questions: Who is impacted, what is it and why do we need it? How does it improve the customer’s experience?>
Background (Required)
<Describes the context or background related to this story>
Approximate what has been done so far for core tekton pipeline controller
Out of scope
<Defines what is not included in this story>
Completion times of List and Get from the DB from the histogram will be a different exercise, as DB tuning, collaboration with Quay.io, and known needed UI optimization need to occur first.
Approach (Required)
<Description of the general technical path on how to achieve the goal of the story. Include details like json schema, class definitions>
So once the memory leak is fixed and sufficient performance tuning if vetted, we establish baselines, excluding the remaining known log storage bugs, around
- The api success rate metric we already expose
- watcher work queue depth
- watcher latency (though this will be much different than pipeline or chains controller since log storage has to be on thread)
- Percentage success (95% at least hopefully) for CreateRecord, CreateResult, UpdateRecord, UpdateResult, UpdateLog GRPC calls
Extra credit: metrics that confirm necessary labels, annotations, finalizers are set.
Dependencies
<Describes what this story depends on. Dependent Stories and EPICs should be linked to the story.>
Acceptance Criteria (Mandatory)
<Describe edge cases to consider when implementing the story and defining tests>
<Provides a required and minimum list of acceptance tests for this story. More is expected as the engineer implements this story>
Done Checklist
- Code is completed, reviewed, documented and checked in
- Unit and integration test automation have been delivered and running cleanly in continuous integration/staging/canary environment
- Continuous Delivery pipeline(s) is able to proceed with new code included
- Customer facing documentation, API docs etc. are produced/updated, reviewed and published
- Acceptance criteria are met
- blocks
-
SRVKP-5900 Update pipeline service SOPs in gitlab/app-interface, get tiger team sign off, for deadlocked metrics, anything else added, results
- Closed
- clones
-
SRVKP-4522 build metric to determine of core tekton controller is not creating pods for pipelines, determine if it is deadlocked
- Closed
- is cloned by
-
SRVKP-4529 build or expose metrics to determine if chains controller is deadlocked or performance severely degraded
- Closed