Loading...

XML

Word

Printable

Type: Epic
Resolution: Done
Priority: Normal
Fix Version/s: mto-1.1
Affects Version/s: None
Component/s: Multiarch-Tuning-Operator
Labels:
- MIXEDARCH
- multiarch-manager-operator

Epic Name:
cluster-wide architecture preferred affinity
Activity Type:
Product / Portfolio Work
Parent Link:
OCPSTRAT-1888Multi-arch Tuning Operator: Cluster-wide architecture preferred/weighted affinity
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Color Status:
Not Selected
Status Summary:
Hide

[5 Mar] <GREEN> Green

Last PRs merged. Pending documentation and Technical Enablement slides.

[26 Feb] <YELLOW> Yellow
1 pr pending. Waiting on e2e tests
Aim is to have this done by end of week

[12 Feb] <GREEN> GREEN

Last 2 prs are pending.

[29 Jan] <GREEN> GREEN

Pending 1 pr merge, 1 dev complete, 1 still in progress

[22 Jan] <GREEN> GREEN
Show
[5 Mar] <GREEN> Green Last PRs merged. Pending documentation and Technical Enablement slides. [26 Feb] <YELLOW> Yellow 1 pr pending. Waiting on e2e tests Aim is to have this done by end of week [12 Feb] < GREEN > GREEN Last 2 prs are pending. [29 Jan] < GREEN > GREEN Pending 1 pr merge, 1 dev complete, 1 still in progress [22 Jan] < GREEN > GREEN
Size:
L

Target Version:
None
Release Blocker:
None

Epic Goal

To add a new field in the API that allows setting the preferredAffinity along with the requiredAffinity, such that users can fine-tune how to distribute workloads that support multiple architectures in a mixarch cluster.

Why is this important?

Users will be able to prefer the allocation of workloads on specific architectures more than others.
In the x86 + arm64 case, this will support a cost-effective deployment by prioritizing arm64 worker nodes and using amd64 nodes primarily for workloads that cannot support arm64.

Scenarios
1. [cost-reduction with arm64] Arm64 CP + Amd64 Workers + Arm64 Workers: minimize the use of amd64 workers by using them primarily for workloads that can't run on arm64.
2. [P AI accelerator]: reduce the load on P workers when AI workloads need to use the accelerator and prevent times of non-utilization/waste of resources by avoiding P workers remain unused when no AI jobs are running because the others are using taints/tolerations or requiredAffinity.

Acceptance Criteria

A new API is added to automatically set the preferredAffinity cluster-wide with weights chosen by the user

Dependencies (internal and external)
1. …

Previous Work (Optional):
1. …

Open questions::
1. Does the current implementation schedulingGates allow to amend the preferredAfinity?

Done Checklist

CI - For new features (non-enablement), existing Multi-Arch CI jobs are not broken by the Epic
Release Enablement: <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR orf GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - If the Epic is adding a new stream, downstream build attached to advisory: <link to errata>
QE - Test plans in Test Plan tracking software (e.g. Polarion, RQM, etc.): <link or reference to the Test Plan>
QE - Automated tests merged: <link or reference to automated tests>
QE - QE to verify documentation when testing
DOC - Downstream documentation merged: <link to meaningful PR>
All the stories, tasks, sub-tasks and bugs that belong to this epic need to have been completed and indicated by a status of 'Done'.