-
Epic
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
None
-
Additional Layer Store Support
-
To Do
-
Product / Portfolio Work
-
-
100% To Do, 0% In Progress, 0% Done
-
False
-
-
False
-
Not Selected
-
None
-
None
Goal
Enable configuration of additional layer stores in CRI-O to support lazy image pulling, allowing containers to start before the entire image is downloaded. This addresses the performance problem where large AI/ML workload images cause significant delays in container boot time (image pull operations account for ~70% of container startup time).
Why is this important?
- Large AI/ML model images (multi-GB) take minutes to pull completely, delaying container startup
- Containers cannot start until 100% of the image is downloaded, impacting application availability and autoscaling responsiveness
- Competitors (AWS Fargate/SOCI, AWS ECS) already offer lazy pulling capabilities
- Without this feature, OpenShift AI/ML workloads suffer from poor startup performance and user experience
Scenarios
- As an AI/ML Platform Operator, I want containers with large model images to start immediately without waiting for full image download, so that my applications are available faster and autoscaling is more responsive
- As an OpenShift Administrator, I want to configure lazy pulling for specific workloads using declarative APIs, so that I can optimize pod startup times without manual node configuration
- As an Application Developer, I want my containers to start quickly even with large images, so that my development iteration cycle is faster
Acceptance Criteria
- CI - MUST be running successfully with tests automated
- Release Technical Enablement - Provide necessary release enablement details and documents
- ContainerRuntimeConfig API accepts additionalLayerStores configuration with path field
- MCO generates correct CRI-O storage.conf with Additional Layer Store settings
- CRI-O resolves image layers from additional layer stores (lazy pulling works)
- Measurable container startup time improvement for large images (>5GB)
- No regressions for standard image pulling behavior
- Works with registries supporting HTTP range requests (Docker Hub, Quay, GitHub Container Registry)
- Feature behind TechPreviewNoUpgrade feature gate for OpenShift 4.22
- Clear documentation on setup, compatible storage plugins, and customer installation procedures (BYOS approach)
Dependencies (internal and external)
- Upstream: container-libs/storage - Stabilize Additional Layer Store API (currently experimental, can change without major version bump)
- OpenShift: Enhancement proposal approval covering both additionalLayerStores and additionalArtifactStores API design
- OpenShift: API merge in openshift/api before MCO implementation
- External: Registry HTTP range request support required for lazy pulling to function
- Documentation: OSDOCS-10167 - Customer documentation for storage plugin installation (stargz-store, nydus-store)
- Related: OCPNODE-4051 (Additional Artifact Store Support) - shares enhancement proposal and MCO implementation
Previous Work (Optional):
OCPNODE-2204- Previous attempt at lazy pulling via stargz-snapshotter (Stale/Abandoned)- Enhancement Proposal: https://github.com/openshift/enhancements/pull/1600
- MCO PR: https://github.com/openshift/machine-config-operator/pull/4248
RHEL-66490- Related bug: Image IDs inconsistent when using zstd:chunked images- Upstream stargz-store from containerd/stargz-snapshotter project
Open questions:
- Can we ship Tech Preview on experimental API? (Additional Layer Store API is currently experimental)
- Should we provide community container images for common plugins (stargz-store) to reduce customer burden?
- What is the support boundary between OpenShift (API/config) and customer responsibility (plugin binaries)?
- Do we need plugin health checks/validation tools?
- Should additionalLayerStores support image pattern filtering like additionalArtifactStores?
Done Checklist
- CI - CI is running, tests are automated and merged.
- Release Technical Enablement <link to Feature Enablement Presentation>
- DEV - Upstream code and tests merged: <link to container-libs/storage PR for API stabilization>
- DEV - Enhancement proposal merged: <link to openshift/enhancements PR>
- DEV - API changes merged: <link to openshift/api PR>
- DEV - MCO implementation merged: <link to openshift/machine-config-operator PR>
- DEV - Downstream build attached to advisory: <link to errata>
- QE - Test plans in Polarion: <link or reference to Polarion>
- QE - Automated tests merged: <link or reference to automated tests>
- DOC - Downstream documentation merged: <link to OSDOCS-10167 PR>
- is related to
-
OCPNODE-4052 Enhancement Proposal - Advanced Container Storage Configuration for CRI-O
-
- To Do
-
-
OCPNODE-4060 OpenShift API - Advanced Container Storage Configuration for CRI-O
-
- To Do
-