-
Feature Request
-
Resolution: Unresolved
-
Critical
-
None
-
None
-
None
-
Product / Portfolio Work
-
None
-
False
-
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
-
None
Summary:
Enable Advanced Traffic Management & DR for llm-d via OpenShift GIE (Rollouts, Failover, A/B)
Description:
The AI Product team requires advanced traffic shaping capabilities for the llm-d inference platform. We request the OpenShift team to configure the Global Ingress Engine (GIE) to support robust operations, moving us away from basic service exposure to a managed traffic architecture.
The goal is to leverage GIE to achieve four key operational capabilities:
- Progressive Rollout Management: Ability to perform Canary deployments (e.g., shift 1%, then 5%, then 20% of traffic) rather than "Big Bang" updates.
- Automated Failover: Intelligent detection of unhealthy llm-d pods or clusters with instant traffic rerouting.
- A/B Testing Integration: Header-based routing to support experimenting with new model versions against control groups.
- Disaster Recovery (DR): A defined GIE policy for multi-region or multi-cluster failover in the event of a total site outage.
User Story:
As the LLM-D Technical Architect,
I want the OpenShift GIE layer to manage ingress traffic dynamically based on health metrics, headers, and weight rules,
So that I can safely deploy new models, test experimental features on a subset of users, and guarantee service continuity during infrastructure outages.
Acceptance Criteria:
1. Intelligent Rollout (Canary/Blue-Green)
- [ ] GIE is configured to support weighted traffic splitting between two distinct llm-d release channels (e.g., stable vs. canary).
- [ ] Traffic weights can be adjusted dynamically via configuration/API without downtime (e.g., Shift traffic 90/10 -> 50/50).
2. A/B Testing (Header-Based Routing)
- [ ] GIE inspects incoming HTTP/gRPC headers.
- [ ] Requests containing a specific header (e.g., x-model-variant: experiment-v2) are deterministically routed to a specific backend service, bypassing the default load balancer logic.
3. Automated Failover (Health-Aware)
- [ ] GIE is integrated with llm-d health probes (Liveness/Readiness).
- [ ] Upon detecting a 5xx error rate spike or latency degradation > 500ms on the primary backend, GIE automatically reroutes traffic to the healthy standby pool/cluster.
4. Disaster Recovery (Multi-Site)
- [ ] Defined Failover Policy: In the event of a "Site A" outage, GIE automatically redirects 100% of global traffic to "Site B" (or the DR cluster).
- [ ] RTO (Recovery Time Objective) for ingress switching is verified to be under 2 minutes.
Technical Constraints & Notes for OpenShift Team:
- Protocols: llm-d utilizes both HTTP/REST and gRPC. GIE must support gRPC load balancing and trailer headers.
- Sticky Sessions: If possible, configure session affinity based on user-id to maximize local KV-cache hit rates on backend nodes (optional, but preferred).