-
Feature
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
BU Product Work
-
False
-
-
False
-
100% To Do, 0% In Progress, 0% Done
-
0
Feature Summary:
The LeaderWorkerSet (LWS) API is designed for deploying and managing groups of pods as a unified replication unit, known as a "super pod." This capability is especially suited for AI/ML inference workloads, where large language models (LLMs) and multi-host inference workflows require sharded models across multiple devices and nodes. The LWS API allows OpenShift to manage distributed inference workloads, where a single leader pod coordinates multiple worker pods, enabling streamlined orchestration for complex AI tasks with high compute and memory demands.
Use Case:
For AI workloads that require distributed inference—such as LLMs or deep learning models with sharding across devices—LWS provides a structured way to orchestrate model replicas with both leaders and workers in a defined topology. This feature enables OpenShift users to deploy sharded AI workloads where models are divided across multiple nodes, providing the flexibility, scalability, and fault tolerance necessary to process large-scale inference requests efficiently.
https://github.com/kubernetes-sigs/lws
https://github.com/kubernetes-sigs/lws/tree/main/docs/examples/llamacpp
https://github.com/kubernetes-sigs/lws/tree/main/docs/examples/vllm/GPU