Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Unresolved
Priority: Critical
Fix Version/s: None
Affects Version/s: None
Component/s: service-mesh
Labels:
- llm-d

Target Version:
None
Activity Type:
Product / Portfolio Work
Status Summary:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Products:
None
Hierarchy Progress Bar:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
None
PX Impact Score:
PX Impact Range:
None
PX Priority Data:
None
PX Technical Impact:
None
PX Technical Impact Notes:
None
PX Scheduling Request:
None

Summary:

Enable Advanced Traffic Management & DR for llm-d via OpenShift GIE (Rollouts, Failover, A/B)

Description:

The AI Product team requires advanced traffic shaping capabilities for the llm-d inference platform. We request the OpenShift team to configure the Global Ingress Engine (GIE) to support robust operations, moving us away from basic service exposure to a managed traffic architecture.

The goal is to leverage GIE to achieve four key operational capabilities:

Progressive Rollout Management: Ability to perform Canary deployments (e.g., shift 1%, then 5%, then 20% of traffic) rather than "Big Bang" updates.

Automated Failover: Intelligent detection of unhealthy llm-d pods or clusters with instant traffic rerouting.

A/B Testing Integration: Header-based routing to support experimenting with new model versions against control groups.

Disaster Recovery (DR): A defined GIE policy for multi-region or multi-cluster failover in the event of a total site outage.

User Story:

As the LLM-D Technical Architect,

I want the OpenShift GIE layer to manage ingress traffic dynamically based on health metrics, headers, and weight rules,

So that I can safely deploy new models, test experimental features on a subset of users, and guarantee service continuity during infrastructure outages.

Acceptance Criteria:

1. Intelligent Rollout (Canary/Blue-Green)

[ ] GIE is configured to support weighted traffic splitting between two distinct llm-d release channels (e.g., stable vs. canary).

[ ] Traffic weights can be adjusted dynamically via configuration/API without downtime (e.g., Shift traffic 90/10 -> 50/50).

2. A/B Testing (Header-Based Routing)

[ ] GIE inspects incoming HTTP/gRPC headers.

[ ] Requests containing a specific header (e.g., x-model-variant: experiment-v2) are deterministically routed to a specific backend service, bypassing the default load balancer logic.

3. Automated Failover (Health-Aware)

[ ] GIE is integrated with llm-d health probes (Liveness/Readiness).

[ ] Upon detecting a 5xx error rate spike or latency degradation > 500ms on the primary backend, GIE automatically reroutes traffic to the healthy standby pool/cluster.

4. Disaster Recovery (Multi-Site)

[ ] Defined Failover Policy: In the event of a "Site A" outage, GIE automatically redirects 100% of global traffic to "Site B" (or the DR cluster).

[ ] RTO (Recovery Time Objective) for ingress switching is verified to be under 2 minutes.

Technical Constraints & Notes for OpenShift Team:

Protocols: llm-d utilizes both HTTP/REST and gRPC. GIE must support gRPC load balancing and trailer headers.

Sticky Sessions: If possible, configure session affinity based on user-id to maximize local KV-cache hit rates on backend nodes (optional, but preferred).

Assignee:: Jamie Longmuir

Reporter:: Naina Singh

Need Info From:: None

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2025/12/03 5:07 AM

Updated:: 2025/12/03 1:49 PM

Target start:: None

Target end:: None

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates