-
Feature Request
-
Resolution: Done
-
Normal
-
openshift-4.12
-
None
-
x86_64
-
-
-
1. Proposed title of this feature request
Minimizing downtime when upgrading clusters with Egress IPs
2. What is the nature and description of the request?
The customer makes use of Egress IPs (while relying on OpenShift SDN / OVS) for namespaces to control the network flow to backend systems, mostly databases. In case of a node reboot during upgrades, it needs some time to reconcile the IPs and the connection to these systems is stalled for roughly 30s, letting requests and transactions fail at the application level. In this case, all business applications that make use of egress IPs are affected by OCP upgrades.
3. Why does the customer need this? (List the business requirements here)
This can be answered by means of the following User Story:
As the owner of a PaaS infrastructure service based on the OpenShift Container Platform, I would like to be able to perform OCP upgrades with zero-downtime, i.e. without disrupting the availability of the critical business applications that are hosted on the clusters I am responsible for, so that I do not have to deal with tickets from application teams and complaints about downtimes from the business units that are impacted.
Business Impact:
OCP upgrades reduce the availability of business applications. In case of 350 apps that are interrupted by every OCP upgrade (let's say a cluster is updated 10 times a year), gives an outage of 10 * 350 * 30 s = 10500 s = 175 min ~ 3 h / year
NOTE: due to constraints in the IPv4 range available for use in the RHOCP platform, the goal is to eventually move to OVN-K and share the same IPv4 EgressIPs for multiple namespaces, which will eventually expand the business impact triggered in the context of EgressIP reconciliation when it comes to disruptive activities (i.e. cluster upgrades & node reboot).
4. List any affected packages or components.
OpenShift SDN (currently) and OpenShift with OVN-Kubernetes (later)