-
Epic
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
None
-
CRI-O Additional Storage Support (Layer, Artifact, and Image Stores)
-
In Progress
-
Product / Portfolio Work
-
-
50% To Do, 33% In Progress, 17% Done
-
False
-
-
False
-
Not Selected
-
None
-
None
Goal
Enable configuration of additional storage locations in CRI-O (layer stores, artifact stores, and image stores) to support lazy image pulling, high-performance artifact storage, and shared image caches. This addresses performance problems where large AI/ML workload images cause significant delays in container boot time (image pull operations account for ~70% of container startup time) and inefficient storage utilization across cluster nodes.
Why is this important?
- Large AI/ML model images (multi-GB) take minutes to pull completely, delaying container startup
- Containers cannot start until 100% of the image is downloaded, impacting application availability and autoscaling responsiveness
- Large AI/ML models need to be stored on high-performance storage (SSD) for faster access
- Multiple cluster nodes redundantly pull identical container images from external registries, wasting network bandwidth
- CRI-O currently lacks flexibility in storage configuration, preventing use of dedicated storage or shared caches
- Cannot pre-populate shared caches across cluster nodes for air-gapped or edge deployments
- Competitors (AWS Fargate/SOCI, AWS ECS) already offer lazy pulling capabilities
- Root filesystem space consumed by large artifacts and duplicate images that should be on separate or shared storage
Scenarios
- As an AI/ML Platform Operator, I want containers with large model images to start immediately without waiting for full image download, so that my applications are available faster and autoscaling is more responsive
- As a RHOAI Platform Operator, I want to store large ML models on SSD storage, so that model loading is faster and doesn't consume root filesystem space
- As a Cluster Admin, I want to pre-populate a read-only image cache on shared network storage (NFS), so that multiple nodes can share images without redundant pulls from external registries
- As an OpenShift Administrator, I want to configure lazy pulling and storage locations for specific workloads using declarative APIs, so that I can optimize pod startup times without manual node configuration
- As a Cluster Admin in an air-gapped environment, I want to pre-populate artifact caches and complete container images on nodes, so that pods can start without pulling from external registries
- As an Edge Deployment Operator, I want to deliver artifacts via removable media (USB) and use SSD-backed storage for frequently-used images, so that edge nodes can operate offline with fast container startup
- As an Application Developer, I want my containers to start quickly even with large images, so that my development iteration cycle is faster
Acceptance Criteria
- CI - MUST be running successfully with tests automated
- Release Technical Enablement - Provide necessary release enablement details and documents
- ContainerRuntimeConfig API accepts additionalLayerStores, additionalArtifactStores, and additionalImageStores configuration with path-based settings, FUSE filesystem interface, and graceful fallback
- additionalLayerStores: path field for lazy pulling configuration (max 5 entries)
- additionalArtifactStores: path and optional filter fields for artifact storage (max 10 entries)
- additionalImageStores: path field for shared image cache configuration (max 10 entries)
- MCO generates correct CRI-O configuration files (storage.conf) with all additional storage settings via MachineConfig
- Single ContainerRuntimeConfig per pool (configurations merged to avoid overrides)
- MachineConfig applied to matching node pools (requires node reboot)
- CRI-O resolves image layers, artifacts, and images from additional stores in configured order
- Lazy pulling works: containers start before full image download (requires registry HTTP range request support)
- Measurable container startup time improvement for large images (>5GB)
- RHOAI team validates measurable performance improvement with SSD storage for artifacts
- Performance validation shows benefits of shared image caches reducing redundant pulls
- Feature works with shared storage (NFS) and high-performance storage (SSD)
- No regressions for standard image pulling and artifact storage behavior
- Works with registries supporting HTTP range requests (Docker Hub, Quay, GitHub Container Registry)
- Feature behind TechPreviewNoUpgrade feature gate for OpenShift 4.22
- Clear documentation on setup, compatible storage plugins, customer installation procedures (BYOS approach), and storage configuration strategies
Dependencies (internal and external)
- Critical Path: CRI-O v1.36 release (April 2026) with PR #9702 merged (for artifact stores)
- Upstream: container-libs/storage - Stabilize Additional Layer Store API (currently experimental, can change without major version bump)
- Upstream: container-libs/storage - additionalimagestores feature (already GA and stable)
- Upstream: CRI-O PR #9702 (https://github.com/cri-o/cri-o/pull/9702) - adds additional_artifact_stores configuration
- OpenShift: Enhancement proposal ✓ MERGED - PR #1934 (https://github.com/openshift/enhancements/pull/1934) - unified proposal covering all three storage types
- OpenShift: API merge in openshift/api (blocks MCO implementation)
- External: Registry HTTP range request support required for lazy pulling to function
- External: RHOAI team (Luca Burgazzoli) for performance validation
- Storage: Pre-populated image cache setup procedures for NFS/SSD storage
- Documentation: OSDOCS-10167 - Customer documentation for storage plugin installation (stargz-store, nydus-store)
- Documentation: OSDOCS-17312 - Tech Preview feature documentation
- Backport consideration: If API work lands for 4.22, may need to backport CRI-O v1.36 PR to v1.35
Previous Work (Optional):
OCPNODE-2204- Previous attempt at lazy pulling via stargz-snapshotter (Stale/Abandoned)- Enhancement Proposal: https://github.com/openshift/enhancements/pull/1600
- MCO PR: https://github.com/openshift/machine-config-operator/pull/4248
- Artifacts may be useful for reference
RHEL-66490- Related bug: Image IDs inconsistent when using zstd:chunked images- RFE-8441 - Related feature request for artifact pre-loading (separate feature, out of scope)
- Upstream stargz-store from containerd/stargz-snapshotter project (proven, mature)
- Proven pattern: containers/storage additionalimagestores approach (already GA and stable)
Open questions:
- Can we ship Tech Preview on experimental API? (Additional Layer Store API is currently experimental - can change without major version bump)
- Timeline: Will CRI-O v1.36 be available in time for OpenShift 4.22? (April 2026 upstream release) - Do we need to backport PR #9702 to CRI-O v1.35?
- Customer support model: Should we provide community container images for storage plugins (stargz-store) to reduce customer burden with BYOS approach?
- Validation criteria: What are the specific performance benchmarks we should target with RHOAI validation for SSD storage?
Done Checklist
- DEV - Enhancement proposal: ✓ DONE -
OCPNODE-4052- PR #1934 (https://github.com/openshift/enhancements/pull/1934) - Merged Feb 4, 2026 - DEV - Upstream code merged - Additional Artifact Store support: https://github.com/cri-o/cri-o/pull/9702
- DEV - API changes merged: OCPNODE-4060
- DEV - MCO implementation merged: OCPNODE-4074
- CI - CI implementation: OCPNODE-4054
- QE - e2e testing automation: OCPNODE-4055
- QE - Pre-merge testing: OCPNODE-4056
- Release Technical Enablement
- DEV - Downstream build attached to advisory
- is related to
-
OCPSTRAT-2623 Additional Artifact Store - 4.22 -TP
-
- In Progress
-
-
OCPSTRAT-1285 Speeding Up Pulling Container Images(Tech Preview)
-
- In Progress
-
- links to