-
Story
-
Resolution: Won't Do
-
Major
-
None
-
3
-
Documentation (Ref Guide, User Guide, etc.), User Experience
-
---
-
---
Story (Required)
As Pipelines admin I want the load to be balanced between Pipelines controllers when configuring multiple replicas and optimizing the pipeline for performance
Background (Required)
In Konflux, configured with two Pipelines controller replicas, when one replica is restarted, all leases are acquired by the active replica. Once the second replica comes back online, it does not acquire any lease and does not share the load.
The pipelines controller deployment offers only one mode for HA leader election
There are two mechanisms available for scaling controllers horizontally via knative/pkg:
- Using leader election, which allows for failover, but can result in hot-spotting.(this is already there in the pipelines controller
- Using StatefulSet ordinals, which doesn't allow for failover, but guarantees keys are evenly spread across replicas.
- leader elected guarantees things can come back quickly, but can have up to 100% outage during the failover (incl. rollout)
- statefulset fails over more slowly, but guarantees that no more than 33% of the keys are unavailable per pod disrupted (for 3 replicas).
We need to add support for StatefulSet ordinals deployment of the pipelines controller
Out of scope
- Failover mechanisms related to leader election are not part of this story.
- Implementing load balancing for non-Pipeline related controllers.
Approach (Required)
<Description of the general technical path on how to achieve the goal of the story. Include details like json schema, class definitions>
Dependencies
<Describes what this story depends on. Dependent Stories and EPICs should be linked to the story.>
Acceptance Criteria (Mandatory)
- Load is balanced evenly across replicas when running multiple Pipelines controller replicas in statefulset mode
- Ensure no more than 33% of the tasks are unavailable per pod disrupted (for 3 replicas).
- Validation of StatefulSet ordinal mode functionality through testing.
INVEST Checklist
Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated
Legend
Unknown
Verified
Unsatisfied
Done Checklist
- Code is completed, reviewed, documented and checked in
- Unit and integration test automation have been delivered and running cleanly in continuous integration/staging/canary environment
- Continuous Delivery pipeline(s) is able to proceed with new code included
- Customer facing documentation, API docs etc. are produced/updated, reviewed and published
- Acceptance criteria are met