-
Epic
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
On Cluster Layering Enhancements
-
False
-
-
False
-
Not Selected
-
To Do
-
0
Summary
This epic involves enabling ocl to support multi-arch builds for mutli-architecutre (heterogenous) enabled clusters. The main goal is to implement a workflow that builds architecture-specific images, pushes them to an internal registry, and assembles them into a multi-arch manifest list. If we don't provide a multi-arch manifest, openshift might try to run an (for example) AMD image on an ARM node, causing a failure. With multi-arch image support, openshift will automatically pull the correct architecture-specific image.
Key Objectives
- Multi-arch build jobs: Create build jobs per architecture detected on the parent cluster that can run natively on their respective nodes. Archs can be (amd64, arm64, ppc64le, or s390x)
- Manifest list assembly: Aggregate arch-specific images into a manifest list and push to Quay.
- Cluster architecture detection: Determine if the cluster is multi-arch to enable/disable multi-arch workflows.
- MOSB controller updates: Handle 1:N relationships (one mosb to multiple jobs) and track statuses.
- We may need to rework how we want machineosbuilds to pair with jobs. For now, it is a 1 to 1 relationship, meaning for every machineosbuild, there is a job. This could mean that customers would run multiple machineosbuilds. We do not know if this needs to be reworked so that a machineosbuild can run multiple jobs.
- Spike
- We may need to rework how we want machineosbuilds to pair with jobs. For now, it is a 1 to 1 relationship, meaning for every machineosbuild, there is a job. This could mean that customers would run multiple machineosbuilds. We do not know if this needs to be reworked so that a machineosbuild can run multiple jobs.
- User Experience: Ensure changes to architecture order in MOSC don’t trigger unnecessary rebuilds.
- Example -> If the MOSC is changed for one architecture, but not the other, should that cause a rebuild for both?
- Spike
- Example -> If the MOSC is changed for one architecture, but not the other, should that cause a rebuild for both?
Multi-Arch Build Process
- Create separate build jobs, one per every architecture the customer has in their cluster
- Each job should run natively on its respective node type
- Detect if the cluster is homogeneous (single arch) or heterogeneous (multi-arch).
- Push built images to an internal registry with arch-specific tags.
- Implement a push job that:
- Waits for (2 if running two archs) N builds to complete.
- Assembles them into a manifest list.
- Pushes the final multi-arch image to Quay.
Handling Multi-Arch Metadata
- Determine how to structure containerfiles for different archs.
- Decide whether to:
- Have a single Containerfile with conditionals for each architecture.
- Maintain separate Containerfiles per architecture.
- Research the best way to attach architecture metadata in the build process.
Build Controller Enhancements
- Modify the build controller to track multiple jobs per MOSB.
- Handle the scenario where (for example) an ARM update doesn’t require an AMD rebuild (currently, MachineOSBuild assumes a one-to-one relationship).
- Investigate how to bind architecture-specific builds without triggering unnecessary re-runs.
Avoiding Unnecessary Builds
- Ensure builds are triggered only when actual changes occur.
- Prevent unnecessary rebuilds if the order of architectures changes in MOSC.
- We can expand on this. Can’t think of anything else for now.
Cluster Architecture Detection
- Modify the Build Controller to detect the cluster’s architecture using:
oc get nodes -o json | jq '.items[].status.nodeInfo.architecture'
- Store this information in:
- A MCP annotation.
- A configmap that jobs can read.
Storage & Deployment
- Define where to store images temporarily before pushing them to Quay.
- Ensure push jobs run on the correct nodes (for example, ARM builds only on ARM nodes).
Considerations & Open Questions
- Architecture Binding:
- How do we bind the correct architecture to each build job?
- Should we annotate MOSB with their architecture?
- Cluster Topology:
- How do we detect if the cluster supports multi-arch natively?
- What if a cluster supports mult but is configured homogenous. This shouldnt be an issue right, but our code cannot create multiple jobs per node then? Our condition for understanding if multi-arch is enabled shouldnt depend on the nativity of the cluster correct (because of the scenario that its still being used homogeneously)?
- Metadata for Containerfile Handling:
- What’s the best format for multi-arch containerfiles?
- Job Execution:
- If an ARM job runs, does the AMD job always need to run?
- How do we link builds together in a manifest list?
- Build Performance:
- Can we parallelize parts of the build process? If we maintain 1 to 1 MOSB per job, would they run concurrently?
- How can the push job wait for all architectures before assembling the manifest list?
Challenges:
- Avoid unnecessary rebuilds when architecture order changes.
- Ensure proper metadata handling in Containerfiles.
- Handle scenarios where only one arch needs updating.
- Ensure correct linking of builds in a multi-arch manifest.
- clones
-
MCO-1173 On Cluster Layering HyperShift Support
-
- New
-
- is depended on by
-
OCPSTRAT-1938 On Cluster Layering: Parity
-
- In Progress
-