Uploaded image for project: 'OpenShift Node'
  1. OpenShift Node
  2. OCPNODE-4050

Additional Layer Store Support

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • None
    • None
    • Additional Layer Store Support
    • To Do
    • Product / Portfolio Work
    • OCPSTRAT-1285Speeding Up Pulling Container Images
    • 100% To Do, 0% In Progress, 0% Done
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • None

      Goal

      Enable configuration of additional layer stores in CRI-O to support lazy image pulling, allowing containers to start before the entire image is downloaded. This addresses the performance problem where large AI/ML workload images cause significant delays in container boot time (image pull operations account for ~70% of container startup time).

      Why is this important?

      • Large AI/ML model images (multi-GB) take minutes to pull completely, delaying container startup
      • Containers cannot start until 100% of the image is downloaded, impacting application availability and autoscaling responsiveness
      • Competitors (AWS Fargate/SOCI, AWS ECS) already offer lazy pulling capabilities
      • Without this feature, OpenShift AI/ML workloads suffer from poor startup performance and user experience

      Scenarios

      1. As an AI/ML Platform Operator, I want containers with large model images to start immediately without waiting for full image download, so that my applications are available faster and autoscaling is more responsive
      2. As an OpenShift Administrator, I want to configure lazy pulling for specific workloads using declarative APIs, so that I can optimize pod startup times without manual node configuration
      3. As an Application Developer, I want my containers to start quickly even with large images, so that my development iteration cycle is faster

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement details and documents
      • ContainerRuntimeConfig API accepts additionalLayerStores configuration with path field
      • MCO generates correct CRI-O storage.conf with Additional Layer Store settings
      • CRI-O resolves image layers from additional layer stores (lazy pulling works)
      • Measurable container startup time improvement for large images (>5GB)
      • No regressions for standard image pulling behavior
      • Works with registries supporting HTTP range requests (Docker Hub, Quay, GitHub Container Registry)
      • Feature behind TechPreviewNoUpgrade feature gate for OpenShift 4.22
      • Clear documentation on setup, compatible storage plugins, and customer installation procedures (BYOS approach)

      Dependencies (internal and external)

      1. Upstream: container-libs/storage - Stabilize Additional Layer Store API (currently experimental, can change without major version bump)
      2. OpenShift: Enhancement proposal approval covering both additionalLayerStores and additionalArtifactStores API design
      3. OpenShift: API merge in openshift/api before MCO implementation
      4. External: Registry HTTP range request support required for lazy pulling to function
      5. Documentation: OSDOCS-10167 - Customer documentation for storage plugin installation (stargz-store, nydus-store)
      6. Related: OCPNODE-4051 (Additional Artifact Store Support) - shares enhancement proposal and MCO implementation

      Previous Work (Optional):

      1. OCPNODE-2204 - Previous attempt at lazy pulling via stargz-snapshotter (Stale/Abandoned)
      2. RHEL-66490 - Related bug: Image IDs inconsistent when using zstd:chunked images
      3. Upstream stargz-store from containerd/stargz-snapshotter project

      Open questions:

      1. Can we ship Tech Preview on experimental API? (Additional Layer Store API is currently experimental)
      2. Should we provide community container images for common plugins (stargz-store) to reduce customer burden?
      3. What is the support boundary between OpenShift (API/config) and customer responsibility (plugin binaries)?
      4. Do we need plugin health checks/validation tools?
      5. Should additionalLayerStores support image pattern filtering like additionalArtifactStores?

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Technical Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to container-libs/storage PR for API stabilization>
      • DEV - Enhancement proposal merged: <link to openshift/enhancements PR>
      • DEV - API changes merged: <link to openshift/api PR>
      • DEV - MCO implementation merged: <link to openshift/machine-config-operator PR>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to OSDOCS-10167 PR>

              sgrunert@redhat.com Sascha Grunert
              sgrunert@redhat.com Sascha Grunert
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: