-
Feature Request
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
False
-
None
-
False
-
Not Selected
-
-
-
-
1. Proposed title of this feature request
Ability to configure load balancers for scaling and security compliance in Azure
2. What is the nature and description of the request?
Customers are attempting to scale clusters on Azure and are looking for a smoother user experience. The ARO team has identified a handful of issues customers experience when scaling their clusters on Azure. These issues are focused around the load balancers and network settings in Azure.
This RFE applies to Azure clusters with an OutboundType of LoadBalancer. In this scenario, the VMs in the backend pool of the LoadBalancer use the frontend IP configurations of the load balancer for egress. Due to the way outbound access within Azure works, Source Network Address Translation happens during an egress connection. Azure configures 64,000 SNAT ports per public IP address. As a result, as a cluster scales, there exists the possibility to exhaust these SNAT ports when initiating outbound connections. https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-outbound-connections#scenarios
OpenShift currently configures LoadBalancer outbound types as follows:
- If a private cluster is created, create an outbound rule and add a single IP address to the outbound rule.
- This allocates a default 1024 SNAT ports per VM in the backend pool, which limits the number of nodes one can add to their OpenShift cluster. The current recommended solution is to scale down the number of SNAT ports per VM to 64, but this can result in outbound SNAT exhaustion issues
- This also limits the number of worker nodes that can be added to the cluster (59 worker nodes in clusters born on or after 4.5)
- If a public cluster is created, use the default outbound access. (Default outbound access in this case will leverage all public IP addresses configured on the frontend IP configs of the public load balancer for egress).
- This is rated the worst type of outbound access
There is ongoing OCP work to add support for NAT gateway, but since clusters exist to date that have been created with LoadBalancer outbound types, we need a solution to allow for scaling customers properly to account for outbound SNAT exhaustion. That outbound type is unrelated to this RFE, but some elements of this RFE may be useful to incorporate (such as OpenShift managing the public IP addresses attached / associated with the NAT gateway).
3. Why does the customer need this? (List the business requirements here)
As a result due to scaling needs within a cluster, we propose the following user stories. These seek to solve the issues with scaling Azure clusters of OutboundType==LoadBalancer.
User stories:
- As an OpenShift customer I need the ability to configure additional public IPs to be used for egress traffic so that I do not experience outbound SNAT exhaustion network errors with my cluster.
- Allow bringing your own or OpenShift created/managed
- Public IP address
- Public IP Prefix
- https://learn.microsoft.com/en-us/azure/load-balancer/outbound-rules#scenario1out
- Allow bringing your own or OpenShift created/managed
- As an OpenShift customer I need the ability to configure the SNAT port allocation to best fit my cluster and workload needs.
- Allow setting of SNAT port allocation: https://learn.microsoft.com/en-us/azure/load-balancer/outbound-rules#scenario2out
4. List any affected packages or components.
- we would request OpenShift have some method of managing this configuration that is currently not managed by an OCP component. Whether that be the introduction of a new component, or something else.
- openshift-installer - should be aware of these restrictions with OutboundType == LoadBalancer and accommodate
- relates to
-
CORS-2767 Hybrid SRE: Add support to disable SNAT for outbound traffic on Azure
- Closed