Uploaded image for project: 'OpenShift Specialist Platform Team'
  1. OpenShift Specialist Platform Team
  2. SPLAT-2257

[Investigation] AWS/Service/NLB: Explore solution to resolve hairpin connection issue affecting default router service on ROSA HCP

    • Icon: Epic Epic
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • None
    • None
    • [Investigation] AWS/Service/NLB: Explore solution to resolve hairpin connection issue affecting default router service on ROSA HCP
    • Product / Portfolio Work
    • OCPSTRAT-2310Disabling Client IP Preservation for hairpinning issue in CCM, Ingress
    • 0% To Do, 0% In Progress, 100% Done
    • None
    • False
    • Not Selected
    • Hide

      2025.07.16: AWS engineer reviewed and approved the PR. A maintainer reviewed and shared a nit, Marco fixed and send back to final review.
      2025.07.08: Asked for reviewers on kube slack: https://kubernetes.slack.com/archives/C0LRMHZ1T/p1751985125693769 
      2025.07.07: PR https://github.com/kubernetes/cloud-provider-aws/pull/1161 approved by Cloud Team. It's looking for CCM Maintainers to final review.
      2025.07.07: PR review addressed keeping it safe to run in parallel. Awaiting for final review.
      2025.07.04: Rebased and re-run jobs. PR is waiting for review.
      2025.06.28: Rebased and fixed the e2e to work properly on CCM upstream CI - fixing SCC.
      2025.06.16: I asked managed services to ensure this behavior matches of what customer is facing with CIO on HCP:  https://redhat-internal.slack.com/archives/CCH60A77E/p1750103773186809?thread_ts=1745435593.239899&cid=CCH60A77E (awaiting managed service teams to validate it)
      2025.06.15: 

      Show
      2025.07.16: AWS engineer reviewed and approved the PR. A maintainer reviewed and shared a nit, Marco fixed and send back to final review. 2025.07.08: Asked for reviewers on kube slack: https://kubernetes.slack.com/archives/C0LRMHZ1T/p1751985125693769   2025.07.07: PR https://github.com/kubernetes/cloud-provider-aws/pull/1161 approved by Cloud Team. It's looking for CCM Maintainers to final review. 2025.07.07: PR review addressed keeping it safe to run in parallel. Awaiting for final review. 2025.07.04: Rebased and re-run jobs. PR is waiting for review. 2025.06.28: Rebased and fixed the e2e to work properly on CCM upstream CI - fixing SCC. 2025.06.16: I asked managed services to ensure this behavior matches of what customer is facing with CIO on HCP:   https://redhat-internal.slack.com/archives/CCH60A77E/p1750103773186809?thread_ts=1745435593.239899&cid=CCH60A77E (awaiting managed service teams to validate it) 2025.06.15:  Hairpin e2e tests implemented in upstream, exposing the problem on NLB-internal deployments: https://github.com/kubernetes/cloud-provider-aws/pull/1161 (awaiting review, upstream CI is currently broken https://github.com/kubernetes/cloud-provider-aws/issues/1167 ) An issue for the Service NLB-Internal has been created on upstream: https://github.com/kubernetes/cloud-provider-aws/issues/1160 (awaiting triage)
    • S
    • 16

      Epic Goal

      • Reproduce the hairpin connection issue in a ROSA HCP cluster
      • Investigate changes required to allow disabling client IP preservation for ingress controller LoadBalancer service. 

      Why is this important?

      Client IP preservation causes that, if a pod opens a connection to the load balancer service and that connection is sent to the same node where the pod resides, the connection fails. This makes mandatory dedicating nodes to ingress controllers, which is not preferred.

      Scenarios

      As a user of an OpenShift Container Platform cluster installed in AWS, I want to be able to:

      • Annotate a LoadBalancer service that uses NLBs so that client IP preservation is disabled
      • Configure a LoadBalancer ingress controller that uses NLB so that its LoadBalancer service has client IP preservation disabled.

      Acceptance Criteria

      • Spikes are run to further define scope of changes
      • Cards are opened which define the work to get to tech preview
      • PM signs off on scope

      Dependencies (internal and external)

      1. ...

      Previous Work (Optional):

      Open questions::

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

      Additional References:

              rhn-support-mrbraga Marco Braga
              rhn-support-rvanderp Richard Vanderpool
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: