-
Feature Request
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
Product / Portfolio Work
-
None
-
False
-
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
-
None
1- Speed up AI model-loading by having a pre-download mechanism for OCI volumes
2- LLM and GenAI workloads are latency-sensitive and frequently involve large models with long initialization times. These workloads often experience spiky traffic patterns and require responsive autoscaling to maintain performance.
By preloading models into the OCI volume. It enables inference services to start faster when scaling up, thus reducing the time between scaling decisions and the ability to serve requests. This is especially valuable in environments using KEDA or any autoscaler, as the infrastructure can respond to load changes with less delay and avoid cold-start bottlenecks.
- links to