Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-63219

CIO/AWS - hairpin connection failed when router is NLB with internal scheme

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.17, 4.18, 4.19, 4.20, 4.21
    • Networking / router
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Hairpin connection failed on clusters deployed with router using NLB with internal scheme.
      
      The hairpin connection impacts any application which the client and server are hosted in the same node, exposed by a Service type-LoadBalancer NLB (only). The CCM creates a NLB with preserve source IP address attribute enabled by default, recently the feature has been implemented to CCM to fix the issue in the Service controller by setting the annotation to configure Target Group attributes disabling client IP preservation, and eventually enable proxy if the backend supports it and need to track the source IP.
      
      The bug on OpenShift CCM is tracked by https://issues.redhat.com/browse/OCPBUGS-58456
      
      More information for CCM changes: https://github.com/kubernetes/cloud-provider-aws/blob/master/docs/service_controller.md#target-group-attributes-for-service-type-loadbalancer-nlb-
      
      The changes will be available on OpenShift CCM after the following PR (o/k 4.21 / 1.34): https://github.com/openshift/cloud-provider-aws/pull/112
      
      The limitation is impacting ROSA private deployments where customer reports issues when the app is in the same node of the router.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

      always    

      Steps to Reproduce:

      Scenario 1:    
      1. Install ROSA or OCP-AWS cluster (with NLB) in internal publish strategy, and router with one replica
      2. Deploy and expose a sample app sticking to the same node of the router
      3. Test accessing the app
          
      Scenario 2:
      1. Expose the App using the default router (created with NLB) created in private subnets

      Actual results:

      Scenario 1) connection timeout for single node
      Scenario 2) Eventually connection timeouts depending the number of replicas the router have in the cluster        

      Expected results:

          hairpin connection works in the private routers

      Additional info:

      CCM-AWS e2e is available to test the scenario: https://github.com/kubernetes/cloud-provider-aws/blob/37381a3a5b7551075e15b38910252a1e33c8d4e9/tests/e2e/loadbalancer.go#L130-L250
      CCM-AWS documentation: https://github.com/kubernetes/cloud-provider-aws/blob/master/docs/service_controller.md#target-group-attributes-for-service-type-loadbalancer-nlb-
      OpenShift CCM-AWS hairpin bug: https://issues.redhat.com/browse/OCPBUGS-58456
      Slack thread: https://redhat-internal.slack.com/archives/CCH60A77E/p1745435593239899 https://redhat-internal.slack.com/archives/CCH60A77E/p1749137394974759?thread_ts=1745435593.239899&cid=CCH60A77E
      
      

              nid-team-bot NID Team Bot
              rhn-support-mrbraga Marco Braga
              None
              None
              Anurag Saxena Anurag Saxena
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: