Loading...

Type: Feature Request
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.16
Component/s: Network - Core, Node
Labels:
None

Target Version:
None
Activity Type:
Product / Portfolio Work
Status Summary:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Products:
None
Hierarchy Progress Bar:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
None
PX Impact Score:
PX Impact Range:
None
PX Priority Data:
None
PX Technical Impact:
None
PX Technical Impact Notes:
None
PX Scheduling Request:
None

Proposed title of this feature request:

Add support for IngressController handling of topologySpreadConstraint spec options for HA mulit-zone placement rules management of router pods.

2. What is the nature and description of the request?

- Currently, IngressController objects do not support the ability to modify TopologySpreadConstraint spec options on deployments, which means that pods are subject to default base placement rules. For high availability multi-availability-zone placement rules, this can prove challenging with default options that include the scheduleAnyway config option with maxSkew: 1. It is desirable to be able to manage this option without the deployment reverting
3. Why does the customer need this? (List the business requirements here)

Multi-zone clusters require 1 router pod (or more) per zone to be able to handle traffic from sharded router-pods for different projects - the default spec is below

        topologySpreadConstraints:
        - labelSelector:
            matchExpressions:
            - key: ingresscontroller.operator.openshift.io/hash
              operator: In
              values:
              - <string>
          maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway

The whenUnsatisfiable option ScheduleAnyway coupled with maxSkew: 1 allows for placement of pods on first-best-fit nodes, which may lead to an un-even placement of pods across the 3 (or more) zones. Being able to manage this spec option to set custom expressions, change the skew value, or set the whenUnsatisfiable spec option to DoNotSchedule would help to prevent pods from being unevenly distributed for critical infra (router-pods).
When router pods are not placed properly in all zones, traffic from those zones is blocked

– Current workarounds available:

The descheduler may be useful in ensuring that the pods are re-deployed correctly when skew occurs during reboots or upgrades.

Increasing your replica count to a higher number for these router pods will help to ensure that your MAXSKEW value of 1 will reduce likelihood that one Availabilityzone is left unscheduled - 3 may not be enough.

Changing the whenUnsatisfiable value to DoNotSchedule will mean that your deployments will start to complain about not being fully rolled out until nodes ARE available to take them across zones, which should result in (eventual) placement of pods rather than immediate satisfaction of pod creation due to skewed weights on selection ordering (I would probably also combine this with a higher replica count depending on workload needs, since it will reduce your global minimums unless otherwise defined).
-~~> NOTE: this requires uncoupling the router~~<shard> from it's ingresscontroller via steps below:

Since the ingresscontroller is going to override any config changes to the deployment, we're going to have to get rid of it so that we can modify the deployment to meet the need:

We'll start by exporting the objects that the target ingresscontroller created (and the ingresscontroller itself for recovery as needed) - Copy the manifest for any relevant deployment, service, configmaps and secrets to local yaml files for creation later from openshift-ingress

Clean up the manifests by removing any specific annotation information like UID/creationtimestamps/status fields, to make these objects generic and ready for creation as "new" objects. 

Remove MANAGEDBY labeling/annotations to ensure these objects are standalone/unmanaged.

 Remove the references to the CRL in the deployment details.

Add the modified TopologySpreadConstraints spec changes to the deployment yaml

Delete the old ntr Ingress Controller object and allow the pods to be terminated.

 Deploy all the updated Deployment, service and secrets directly to recreate the same object stack but this time there is no ingresscontroller that manages them.

Test the route on the custom router deployment and validate that the changes are working as expected.

I have worked with other customers who have created multiple shards for each availability zone as a solution for similar placement, which is also an option here. For example: NodeSelector preference for ingresscontroller router-shard-a would be placement of router pods (1 or 2 replicas) on nodes in zone -1a. NodeSelector value for router-shard-b would be placement of pod (1 or 2 replica) on nodes in zone -1b, etc. This would ensure that you continue to have consistent pod placement on specific zones without having to rely on scheduling choice order across all selected infra nodes.

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates