Loading...

XML

Word

Printable

Type: Feature
Resolution: Done
Priority: Major
Fix Version/s: openshift-4.17
Affects Version/s: None
Component/s: Multi-arch
Labels:

Work Type:
BU Product Work
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done
Size:
L
Target Version:

openshift-4.17

Risk Score:
0

Discussion Needed:

Program Call

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

PX Priority Data:
PX Technical Impact:
PX Impact Range:
PX Review Complete:

Intelligence Requested:
Market:

Feature Overview (aka. Goal Summary)

Recently, cloud vendors have started offering compute nodes leveraging different CPU architectures. With the rising use and support of multi-architecture compute nodes in Openshift clusters, users and cluster administrators face new challenges when deploying workloads. Primarily, the scheduler does not consider the images' compatibility with the CPU architectures when filtering nodes. We must assume that the images used for deploying workloads would only support some architecture the cluster could consist of. The typical workaround is to manually add affinities to the pod spec to ensure that the pods land on nodes where the image binaries can be executed. This strongly impacts the user experience: it couples the set of compatible architectures at a given point in time to the pod spec, does not scale well, and is challenging to maintain. The multiarch-manager-operator aims to automate the inspection of the container images, derive a set of architectures supported by a pod and use it to automatically define strong predicates based on the kubernetes.io/arch label in the pod's nodeAffinity.

Goals (aka. expected user outcomes)

Provide a way to ensure that pods are scheduled on nodes running a CPU architecture supported by all the images of their containers. If such a node does not exist, the pod holds in the Pending phase giving the user a clear indication that the node affinity constraint is not met.

Requirements (aka. Acceptance Criteria):

Easy adoption of multi-architecture compute clusters by customers with varying workload images which might not be built for specific architectures
Provides users a "clean" user experience without having to worry about manually modifying deployment specs when deploying workloads on a multi-arch compute clusters

Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.

Deployment considerations	List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both	n/a
Classic (standalone cluster)	n/a
Hosted control planes	n/a
Multi node, Compact (three node), or Single node (SNO), or all	n/a
Connected / Restricted Network	n/a
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)	x86_x64, ARM (aarch64)
Operator compatibility	n/a
Backport needed (list applicable versions)	n/a
UI need (e.g. OpenShift Console, dynamic plugin, OCM)	n/a
Other (please specify)	This will be a separate operator available via OLM

Use Cases (Optional):

As a user who wants to deploy workloads on a multi-arch compute cluster, I want the pods to be scheduled on the appropriate subset of nodes based on the architectures supported by their containers' images so that the coupling between the characteristics of the images and the nodeAffinity declared in the pod spec template does not require future manual maintenance due to image updates provided by the external stake-holders maintaining them
As a cluster administrator, when I add a new node of a secondary architecture to a cluster, I do not want to taint the nodes to be unschedulable to ensure that workloads continue to deploy and run as expected given that the images used for deploying workloads may not support the above said secondary architecture
As a cluster administrator who wants to migrate from a single-arch to a multi-arch compute cluster, I do not want to rebuild each container image for all the architectures that are present in the cluster
As a user who wants to deploy workloads on a multi-arch compute cluster, I do not want to modify every pod (or any resources owning pods) spec to include a node affinity predicate to filter the nodes at scheduling time, based on the compatible architectures
As an operator developer, I want to allow users to develop operators only supporting a subset of architectures or a single architecture without additional business logic changes or requirements around having to specify any affinities themselves. This will not be encouraged but will not become an impediment to rolling out a feature that can work on multi-arch compute clusters

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

This feature does not aim to modify the Kubernetes scheduler
This feature does not "schedule" pods; it influences scheduling by patching the affinity of gated pods¹
Modification of fields other than the node affinity is not supported. Moreover, the controller logic will only consider the kubernetes.io/arch label

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.

Interoperability Considerations

Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.

links to

openshift/openshift-docs#80786: OSDOCS#11845: Multiarch Tuning Operator TP to GA upgrade

Assignee:: Duncan Hardie

Reporter:: Duncan Hardie

QA Contact:: Lin Wang

Doc Contact:: Srikanth R

Architect:: Prashanth Sundararaman

Product Manager:: Duncan Hardie

Product Operations Engineering Contact:: Jon Thomas

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/04/05 1:28 PM

Updated:: 2024/10/16 10:58 PM

Resolved:: 2024/09/27 10:17 AM

Target end:: 2024/09/05

Details

Description

Feature Overview (aka. Goal Summary)

Goals (aka. expected user outcomes)

Requirements (aka. Acceptance Criteria):

Use Cases (Optional):

Questions to Answer (Optional):

Out of Scope

Background

Customer Considerations

Documentation Considerations

Interoperability Considerations

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates