-
Feature Request
-
Resolution: Unresolved
-
Normal
-
None
-
all
-
None
-
None
-
Product / Portfolio Work
-
None
-
False
-
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
-
None
1. Proposed title of this feature request
"Change ingress controller "max unavailable" to accommodate rolling update"
2. What is the nature and description of the request?
We run 3 replicas for our router ingress controller pods and the current rollingupdate strategy of maxUnavailable: 50% takes out 2 of the 3 pods at a time anytime we have an app rollout/patching activities. This can incur 504's on these shared clusters where the 1 pod cannot handle all the traffic for that window when those pods recycle. We want to change this rollingupdate strategy to be maxUnavailable: 1 but so far we havent been able to find a way to persist this config change and roll this out to our ingress controller deployments. It seems the operator blocks us from making this change. Below is the update from the cu in our discussion:
Beyers, Christopher. Have you attempted to modify the PDB? I expect that the operator will roll it back, but maybe not.
The 3 Rolling Update Configurations:
1. IngressController unsupportedConfigOverrides:
yaml
maxUnavailable: 1 # Absolute value
2. Deployment Strategy:
yaml
maxUnavailable: "50%" # Percentage = 1.5 → rounds UP to 2 pods
3. PodDisruptionBudget:
yaml
maxUnavailable: 25% # Percentage = 0.75 → rounds UP to 1 pod
{}Which One is Actually Used?{}
For Rolling Updates (Application Updates):
• {}Deployment Strategy{} (maxUnavailable: "50%") controls this
• {}Result{}: Up to 2 pods can be unavailable during app rollouts
• {}PDB is ignored{} during rolling updates
For External Disruptions (Node maintenance, manual evictions):
• {}PodDisruptionBudget{} (maxUnavailable: 25%) controls this
• {}Result{}: Only 1 pod can be disrupted during maintenance
• {}Deployment strategy is ignored{} for external disruptions
For IngressController unsupportedConfigOverrides:
• {}Currently ignored{} by the operator (as we've seen)
• {}Should{} override the deployment strategy but doesn't
{}The Problem:{}
With 3 replicas:
• {}App rollouts{}: Can take down 2 pods (50% of 3 = 1.5 → 2)
• {}Node maintenance{}: Can only take down 1 pod (25% of 3 = 0.75 → 1)
This inconsistency means:
• {}Application updates are risky{} (only 1 pod left)
• {}Infrastructure maintenance is safe{} (2 pods left)
The Deployment Strategy is what's causing your outage risk during application rollouts, not the PDB.
So really it seems like a bug to have inconsistent patterns here with the rollingupdate config, shouldnt node updates and app updates be in sync with how they disrupt the pods?
It's really more of a design thing... PDB was specifically designed to control behavior with external influences, where the update strategy is under app team control.
It's the PDB that is biting you, correct?
When we do planned patching maintenance, it appears it does app rollouts and so the deployment rollingupdate strategy is what is used that is set to that 50% maxUnavailable. And in that window of patching is when the tenants alerts for their route was returning those 504's and raised it to our attention. When we dug deeper, Q was able to cross-check the timestamps of the 504's and the pods in play to notice we dropped 2 out of 3 pods at one time in this patching (cluster upgrade) window. It also was able to diagnose the 1 single pod that was serving the traffic for that short period as being overwhelmed with the requests, which led to intermittent 504's. Our worry is as we migrate more tenants over to these api/web models, this is only going to grow as a problem
The OpenShift 4.17.35 Ingress Operator appears to ignore the deployment.spec.strategy override in unsupportedConfigOverrides. This suggests:
1. This path is not supported in this OpenShift version
2. The operator doesn't allow deployment strategy overrides for safety reasons
3. Different syntax may be required
3. Why does the customer need this? (List the business requirements here)
Because 1 ingress pod is not enough to handle rollingupdates. This is causing 504 errors and issues during upgrades.
4. List any affected packages or components.
Openshift ingress controller