-
Feature
-
Resolution: Done
-
Major
-
None
-
BU Product Work
-
False
-
-
False
-
0% To Do, 0% In Progress, 100% Done
-
L
-
0
-
Program Call
-
-
-
Provide a way to ensure that pods are scheduled on nodes running a CPU architecture supported by all the images of their containers.
-
-
Feature Overview (aka. Goal Summary)
Recently, cloud vendors have started offering compute nodes leveraging different CPU architectures. With the rising use and support of multi-architecture compute nodes in Openshift clusters, users and cluster administrators face new challenges when deploying workloads. Primarily, the scheduler does not consider the images' compatibility with the CPU architectures when filtering nodes. We must assume that the images used for deploying workloads would only support some architecture the cluster could consist of. The typical workaround is to manually add affinities to the pod spec to ensure that the pods land on nodes where the image binaries can be executed. This strongly impacts the user experience: it couples the set of compatible architectures at a given point in time to the pod spec, does not scale well, and is challenging to maintain. The multiarch-manager-operator aims to automate the inspection of the container images, derive a set of architectures supported by a pod and use it to automatically define strong predicates based on the kubernetes.io/arch label in the pod's nodeAffinity.
Goals (aka. expected user outcomes)
Provide a way to ensure that pods are scheduled on nodes running a CPU architecture supported by all the images of their containers. If such a node does not exist, the pod holds in the Pending phase giving the user a clear indication that the node affinity constraint is not met.
Requirements (aka. Acceptance Criteria):
Easy adoption of multi-architecture compute clusters by customers with varying workload images which might not be built for specific architectures
Provides users a "clean" user experience without having to worry about manually modifying deployment specs when deploying workloads on a multi-arch compute clusters
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | n/a |
Classic (standalone cluster) | n/a |
Hosted control planes | n/a |
Multi node, Compact (three node), or Single node (SNO), or all | n/a |
Connected / Restricted Network | n/a |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | x86_x64, ARM (aarch64) |
Operator compatibility | n/a |
Backport needed (list applicable versions) | n/a |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | n/a |
Other (please specify) | This will be a separate operator available via OLM |
Use Cases (Optional):
- As a user who wants to deploy workloads on a multi-arch compute cluster, I want the pods to be scheduled on the appropriate subset of nodes based on the architectures supported by their containers' images so that the coupling between the characteristics of the images and the nodeAffinity declared in the pod spec template does not require future manual maintenance due to image updates provided by the external stake-holders maintaining them
- As a cluster administrator, when I add a new node of a secondary architecture to a cluster, I do not want to taint the nodes to be unschedulable to ensure that workloads continue to deploy and run as expected given that the images used for deploying workloads may not support the above said secondary architecture
- As a cluster administrator who wants to migrate from a single-arch to a multi-arch compute cluster, I do not want to rebuild each container image for all the architectures that are present in the cluster
- As a user who wants to deploy workloads on a multi-arch compute cluster, I do not want to modify every pod (or any resources owning pods) spec to include a node affinity predicate to filter the nodes at scheduling time, based on the compatible architectures
- As an operator developer, I want to allow users to develop operators only supporting a subset of architectures or a single architecture without additional business logic changes or requirements around having to specify any affinities themselves. This will not be encouraged but will not become an impediment to rolling out a feature that can work on multi-arch compute clusters
Questions to Answer (Optional):
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
<your text here>
Out of Scope
- This feature does not aim to modify the Kubernetes scheduler
- This feature does not "schedule" pods; it influences scheduling by patching the affinity of gated pods1
- Modification of fields other than the node affinity is not supported. Moreover, the controller logic will only consider the kubernetes.io/arch label
Background
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
<your text here>
Customer Considerations
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Documentation Considerations
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
<your text here>
Interoperability Considerations
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
<your text here>
- links to