Loading...

XML

Word

Printable

Type: Epic
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: Istio
Labels:
- FeatureDevelopment
- Upstream

Epic Name:
Perf & Scale Guidance
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False

Target Version:

OSSM 3.3.0

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Intelligence Requested:
Market:

This epic aims to provide best practices for configuring OpenShift Service Mesh (OSSM/Istio/Envoy) for high performance and large-scale environments. It covers the documentation of scaling behaviors of OSSM components like istiod, gateway, ztunnel, waypoint, and sidecars, defines relevant inputs (e.g. number of pods, services, namespaces, clusters, configuration size) and key metrics like throughput, latency, CPU/memory use. We will identify common usage scenarios (REST throughput, websocket latency, resource efficiency for internal traffic, ambient vs. sidecar, etc.) and develop optimization strategies for each. The epic also includes investigating customer usage patterns to inform our guidance. While the outcome of this epic should influence our perf&scale automated tests (ie terminology and scenarios should match), it does not cover creation of the tests themselves.

Original Description:

There is a broad range of topics that fall under "performance and scale" for service mesh that we are often asked to address. This original issue was created for OpenShift AI, but this is applicable to any user of OpenShift Service Mesh (Istio) or Envoy. A common question is "How can I setup service mesh (or Ingress) for high load or scale?".

Questions this issue should look to answer:

How do we recommend OSSM be configured to adapt to a high performance deployment (large number of requests per second)? How should resources be adjusted as these parameters increase?

This question can be answered for:

A standalone Istio gateway (such as OpenShift Ingress w/ Gateway API support)

An Istio gateway + an example application (Bookinfo) with sidecars and mTLS encryption enabled

An Istio gateway + an example application (Bookinfo) with ambient mode (ZTunnels) mTLS encryption enabled

An Istio gateway + an example application (Bookinfo) with ambient mode (ZTunnels + Waypoint proxy) mTLS encryption enabled

How do we recommend OSSM be configured for a high-scale (large number of services, namespaces, nodes, clusters) environment? How should resources be adjusted as these parameters increase?

It should also be noted that the tests/infrastructure to carry this out should be the same across product vs upstream, so to the greatest degree possible this should use / contribute to upstream project material. With that said, as upstream Istio uses boringSSL vs downstream Istio using OpenSSL, there is value in upstream vs downstream comparisons.

From the original issue:

Adding this as a story from the mail thread "Questions about Service Mesh scaling/higher load scenarios"

With regards to OpenShift AI scale and performance testing, we (OpenShift Serverless) are doing our own testing to have some numbers for OpenShift Serverless with and without Service-Mesh integration. Regarding that, we've got some questions:

In general, do you have any recommendations or best practices on how to set up Service Mesh for larger environments and/or environments with high(er) load scenarios? For example we think it's a good idea to enable autoscaling of istio-ingressgateway and increase requests/limits for istio-ingressgateways and istio-proxies. Anything else that we should be aware of?

We should also provide sizing guidance for the Istio sidecar proxies, gateways and control plane, so that we can add a new perf & scale section to our product documentation.

Related information

blocks

SRVKS-1075 Performance Benchmarking for Serving

Backlog

Assignee:: Daniel Grimm

Reporter:: Jamie Longmuir

Contributors:: Daniel Grimm, Jamie Longmuir, Ryan O'Hara (Inactive), Tim Walsh

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2023/10/12 5:15 AM

Updated:: 2025/09/22 2:52 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates