-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.17, 4.18, 4.19, 4.20, 4.21
-
None
-
False
-
-
None
-
None
-
None
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Hairpin connection failed on clusters deployed with router using NLB with internal scheme. The hairpin connection impacts any application which the client and server are hosted in the same node, exposed by a Service type-LoadBalancer NLB (only). The CCM creates a NLB with preserve source IP address attribute enabled by default, recently the feature has been implemented to CCM to fix the issue in the Service controller by setting the annotation to configure Target Group attributes disabling client IP preservation, and eventually enable proxy if the backend supports it and need to track the source IP. The bug on OpenShift CCM is tracked by https://issues.redhat.com/browse/OCPBUGS-58456 More information for CCM changes: https://github.com/kubernetes/cloud-provider-aws/blob/master/docs/service_controller.md#target-group-attributes-for-service-type-loadbalancer-nlb- The changes will be available on OpenShift CCM after the following PR (o/k 4.21 / 1.34): https://github.com/openshift/cloud-provider-aws/pull/112 The limitation is impacting ROSA private deployments where customer reports issues when the app is in the same node of the router.
Version-Release number of selected component (if applicable):
How reproducible:
always
Steps to Reproduce:
Scenario 1: 1. Install ROSA or OCP-AWS cluster (with NLB) in internal publish strategy, and router with one replica 2. Deploy and expose a sample app sticking to the same node of the router 3. Test accessing the app Scenario 2: 1. Expose the App using the default router (created with NLB) created in private subnets
Actual results:
Scenario 1) connection timeout for single node Scenario 2) Eventually connection timeouts depending the number of replicas the router have in the cluster
Expected results:
hairpin connection works in the private routers
Additional info:
CCM-AWS e2e is available to test the scenario: https://github.com/kubernetes/cloud-provider-aws/blob/37381a3a5b7551075e15b38910252a1e33c8d4e9/tests/e2e/loadbalancer.go#L130-L250 CCM-AWS documentation: https://github.com/kubernetes/cloud-provider-aws/blob/master/docs/service_controller.md#target-group-attributes-for-service-type-loadbalancer-nlb- OpenShift CCM-AWS hairpin bug: https://issues.redhat.com/browse/OCPBUGS-58456 Slack thread: https://redhat-internal.slack.com/archives/CCH60A77E/p1745435593239899 https://redhat-internal.slack.com/archives/CCH60A77E/p1749137394974759?thread_ts=1745435593.239899&cid=CCH60A77E
- relates to
-
OCPBUGS-58456 CCM/AWS - hairpin connection failed when Service type-LoadBalancer NLB with internal scheme
-
- ASSIGNED
-
-
OCPBUGS-16199 Add warning about internal NLBs Client IP Preservation issue
-
- Closed
-
-
OCPSTRAT-2310 Disabling Client IP Preservation for hairpinning issue in CCM, Ingress
-
- In Progress
-