Uploaded image for project: 'Machine Config Operator'
  1. Machine Config Operator
  2. MCO-1344

On Cluster Layering Heterogeneous Clusters Support

XMLWordPrintable

    • On Cluster Layering Enhancements
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • To Do
    • 0

      Summary

      This epic involves enabling ocl to support multi-arch builds for mutli-architecutre (heterogenous) enabled clusters. The main goal is to implement a workflow that builds architecture-specific images, pushes them to an internal registry, and assembles them into a multi-arch manifest list. If we don't provide a multi-arch manifest, openshift might try to run an (for example) AMD image on an ARM node, causing a failure. With multi-arch image support, openshift will automatically pull the correct architecture-specific image.

      Key Objectives

      • Multi-arch build jobs: Create build jobs per architecture detected on the parent cluster that can run natively on their respective nodes. Archs can be (amd64, arm64, ppc64le, or s390x)
      • Manifest list assembly: Aggregate arch-specific images into a manifest list and push to Quay.
      • Cluster architecture detection: Determine if the cluster is multi-arch to enable/disable multi-arch workflows.
      • MOSB controller updates: Handle 1:N relationships (one mosb to multiple jobs) and track statuses.
        • We may need to rework how we want machineosbuilds to pair with jobs. For now, it is a 1 to 1 relationship, meaning for every machineosbuild, there is a job. This could mean that customers would run multiple machineosbuilds. We do not know if this needs to be reworked so that a machineosbuild can run multiple jobs.
          • Spike
      • User Experience: Ensure changes to architecture order in MOSC don’t trigger unnecessary rebuilds.
        • Example -> If the MOSC is changed for one architecture, but not the other, should that cause a rebuild for both?
          • Spike

      Multi-Arch Build Process

      • Create separate build jobs, one per every architecture the customer has in their cluster
        • Each job should run natively on its respective node type
      • Detect if the cluster is homogeneous (single arch) or heterogeneous (multi-arch).
      • Push built images to an internal registry with arch-specific tags.
      • Implement a push job that:
        • Waits for (2 if running two archs) N builds to complete.
        • Assembles them into a manifest list.
        • Pushes the final multi-arch image to Quay.

      Handling Multi-Arch Metadata

      • Determine how to structure containerfiles for different archs.
      • Decide whether to:
        • Have a single Containerfile with conditionals for each architecture.
        • Maintain separate Containerfiles per architecture.
        • Research the best way to attach architecture metadata in the build process.

      Build Controller Enhancements

      • Modify the build controller to track multiple jobs per MOSB.
      • Handle the scenario where (for example) an ARM update doesn’t require an AMD rebuild (currently, MachineOSBuild assumes a one-to-one relationship).
      • Investigate how to bind architecture-specific builds without triggering unnecessary re-runs.

      Avoiding Unnecessary Builds

      • Ensure builds are triggered only when actual changes occur.
      • Prevent unnecessary rebuilds if the order of architectures changes in MOSC.
      • We can expand on this. Can’t think of anything else for now.

      Cluster Architecture Detection

      • Modify the Build Controller to detect the cluster’s architecture using:
      oc get nodes -o json | jq '.items[].status.nodeInfo.architecture'
      • Store this information in:
        • A MCP annotation.
        • A configmap that jobs can read.

      Storage & Deployment

      • Define where to store images temporarily before pushing them to Quay.
      • Ensure push jobs run on the correct nodes (for example, ARM builds only on ARM nodes).

      Considerations & Open Questions

      • Architecture Binding:
        • How do we bind the correct architecture to each build job?
        • Should we annotate MOSB with their architecture?
      • Cluster Topology:
        • How do we detect if the cluster supports multi-arch natively?
        • What if a cluster supports mult but is configured homogenous. This shouldnt be an issue right, but our code cannot create multiple jobs per node then? Our condition for understanding if multi-arch is enabled shouldnt depend on the nativity of the cluster correct (because of the scenario that its still being used homogeneously)?
      • Metadata for Containerfile Handling:
        • What’s the best format for multi-arch containerfiles?
      • Job Execution:
        • If an ARM job runs, does the AMD job always need to run?
        • How do we link builds together in a manifest list?
      • Build Performance:
        • Can we parallelize parts of the build process? If we maintain 1 to 1 MOSB per job, would they run concurrently?
        • How can the push job wait for all architectures before assembling the manifest list?

      Challenges:

      • Avoid unnecessary rebuilds when architecture order changes.
      • Ensure proper metadata handling in Containerfiles.
      • Handle scenarios where only one arch needs updating.
      • Ensure correct linking of builds in a multi-arch manifest.

              dkhater@redhat.com Dalia Khater
              mkrejci-1 Michelle Krejci
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: