-
Epic
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
cluster-wide architecture preferred affinity
-
False
-
None
-
False
-
Not Selected
-
NEW
-
To Do
-
33% To Do, 67% In Progress, 0% Done
-
L
-
3
Epic Goal
- To add a new field in the API that allows setting the preferredAffinity along with the requiredAffinity, such that users can fine-tune how to distribute workloads that support multiple architectures in a mixarch cluster.
Why is this important?
- Users will be able to prefer the allocation of workloads on specific architectures more than others.
- In the x86 + arm64 case, this will support a cost-effective deployment by prioritizing arm64 worker nodes and using amd64 nodes primarily for workloads that cannot support arm64.
Scenarios
1. [cost-reduction with arm64] Arm64 CP + Amd64 Workers + Arm64 Workers: minimize the use of amd64 workers by using them primarily for workloads that can't run on arm64.
2. [P AI accelerator]: reduce the load on P workers when AI workloads need to use the accelerator and prevent times of non-utilization/waste of resources by avoiding P workers remain unused when no AI jobs are running because the others are using taints/tolerations or requiredAffinity.
Acceptance Criteria
- A new API is added to automatically set the preferredAffinity cluster-wide with weights chosen by the user
Dependencies (internal and external)
1. …
Previous Work (Optional):
1. …
Open questions::
1. Does the current implementation schedulingGates allow to amend the preferredAfinity?
Done Checklist
- CI - For new features (non-enablement), existing Multi-Arch CI jobs are not broken by the Epic
- Release Enablement: <link to Feature Enablement Presentation>
- DEV - Upstream code and tests merged: <link to meaningful PR orf GitHub Issue>
- DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
- DEV - If the Epic is adding a new stream, downstream build attached to advisory: <link to errata>
- QE - Test plans in Test Plan tracking software (e.g. Polarion, RQM, etc.): <link or reference to the Test Plan>
- QE - Automated tests merged: <link or reference to automated tests>
- QE - QE to verify documentation when testing
- DOC - Downstream documentation merged: <link to meaningful PR>
- All the stories, tasks, sub-tasks and bugs that belong to this epic need to have been completed and indicated by a status of 'Done'.
- clones
-
MULTIARCH-4252 As a user I want to setup per-namespace weights for the architectures to be consumed by pods in a mixed cluster
- To Do