[RFE-5119] Minimizing downtime when upgrading clusters with Egress IPs - Red Hat Issue Tracker

Type: Feature Request
Resolution: Done
Priority: Normal
Fix Version/s: openshift-4.17
Affects Version/s: openshift-4.12
Component/s: SDN
Labels:
- rfe-approved-to-closed-done

Architecture:

x86_64
Market:
PX Impact Score:
PX Priority Data:
PX Review Complete:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

1. Proposed title of this feature request

Minimizing downtime when upgrading clusters with Egress IPs

2. What is the nature and description of the request?

The customer makes use of Egress IPs (while relying on OpenShift SDN / OVS) for namespaces to control the network flow to backend systems, mostly databases. In case of a node reboot during upgrades, it needs some time to reconcile the IPs and the connection to these systems is stalled for roughly 30s, letting requests and transactions fail at the application level. In this case, all business applications that make use of egress IPs are affected by OCP upgrades.

3. Why does the customer need this? (List the business requirements here)

This can be answered by means of the following User Story:

As the owner of a PaaS infrastructure service based on the OpenShift Container Platform, I would like to be able to perform OCP upgrades with zero-downtime, i.e. without disrupting the availability of the critical business applications that are hosted on the clusters I am responsible for, so that I do not have to deal with tickets from application teams and complaints about downtimes from the business units that are impacted.

Business Impact:

OCP upgrades reduce the availability of business applications. In case of 350 apps that are interrupted by every OCP upgrade (let's say a cluster is updated 10 times a year), gives an outage of 10 * 350 * 30 s = 10500 s = 175 min ~ 3 h / year

NOTE: due to constraints in the IPv4 range available for use in the RHOCP platform, the goal is to eventually move to OVN-K and share the same IPv4 EgressIPs for multiple namespaces, which will eventually expand the business impact triggered in the context of EgressIP reconciliation when it comes to disruptive activities (i.e. cluster upgrades & node reboot).

4. List any affected packages or components.

OpenShift SDN (currently) and OpenShift with OVN-Kubernetes (later)

Assignee:: Marc Curry

Reporter:: François Charette

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2024/01/26 3:22 PM

Updated:: 2025/03/21 8:24 AM

Resolved:: 2024/02/07 7:00 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

Hide