Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-8441

Speed up AI model-loading by having a pre-download mechanism for OCI volumes

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • AI/ML Workloads, Node
    • None
    • Product / Portfolio Work
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      1- Speed up AI model-loading by having a pre-download mechanism for OCI volumes

      2- LLM and GenAI workloads are latency-sensitive and frequently involve large models with long initialization times. These workloads often experience spiky traffic patterns and require responsive autoscaling to maintain performance.

      By preloading models into the OCI volume. It enables inference services to start faster when scaling up, thus reducing the time between scaling decisions and the ability to serve requests. This is especially valuable in environments using KEDA or any autoscaler, as the infrastructure can respond to load changes with less delay and avoid cold-start bottlenecks.

       

       

              gausingh@redhat.com Gaurav Singh
              mfentane@redhat.com Myriam Fentanes
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                None
                None