Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-8214

Custom IngressController topology awareness for OCP/ROSA

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • openshift-4.16
    • Network - Core, ROSA
    • None
    • None
    • Product / Portfolio Work
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      1. Proposed title of this feature request

      Enable Custom IngressControllers topology awareness to ensure customer capability to distribute the router pods per Availability Zone.

      2. What is the nature and description of the request?

      IBM is requesting the ability to configure topology-aware pod placement for custom IngressController router pods in Openshift (Specifically ROSA STS). In our understanding for the IC to become topology aware it is required to expose topologySpreadConstraings or similar affinity mechanisms within the IC Custom Resource. We believe this would allow a precise control over router pod distribution across availability zones (AZ), ensuring NLB health and per-AZ resilience.

      3. Why does the customer need this? (List the business requirements here)

      IBM has reported issues in their production environment, which spans six Availability Zones (AZs). In their architecture, they do not rely on AWS DNS to resolve IPs for healthy Network Load Balancer (NLB) targets as described in [0]. Instead, they have implemented a design in which an individual NLB is provisioned per customer, enabling PrivateLink access into IBM’s product VPC. These are referred to as “customer load balancers.”

      Each customer NLB is configured to forward traffic to a shared internal NLB (referred to as the “shared load balancer”) that fronts a private IngressController (IC). This shared NLB then routes traffic into the appropriate ROSA cluster.

      To ensure AZ-local traffic handling, IBM configures the customer NLBs with target groups that forward traffic to one shared NLB network interface per AZ. This design assumes that a healthy router pod (i.e., a healthy target) is available in each AZ. However, with the current ROSA setup, there are cases where a router pod is not scheduled in a given AZ, resulting in an unhealthy target in that AZ. When end-client traffic is routed to such an AZ, the connection fails, as the target is not marked healthy.

      Due to the critical nature of their environment, IBM cannot modify this architecture—any changes that could introduce downtime are not acceptable, as the system is already in production and serving external customers.

       [0] https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-troubleshooting.html#dns-name-resolution

      4. List any affected packages or components.

      cluster-ingress-operator / IngressController

      • Currently IngressController.spec has soft topologySpreadConstraints in place with maxSkew: 1 and whenUnsatisfiable: ScheduleAnyway and it cannot be modified.

              mcurry@redhat.com Marc Curry
              rhn-support-alvolkov Alex Volkov
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                None
                None