Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-54702

Ensure AWS NLB loadbalancers dont drop haproxy frontend connections

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.18
    • Networking / router
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • Rejected
    • NI&D Sprint 270, NI&D Sprint 271, NI&D Sprint 272, NI&D Sprint 274, NI&D Sprint 278
    • 5
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem

      Currently the default idle timeout for NLB is 350s, but the ingresscontroller sets a tunnel timeout of 1h. This is causing customers to experience a high number of idle connection timeouts in HAProxy, and TCP resets on the client side.

      We should better align the NLB TCP idle timeout with the HAProxy tunnel timeout as we do for Classic ELB, and support customers setting a custom timeout annotation for NLB to bring parity with Classic ELBs.

      Required upstream feature: https://github.com/kubernetes-sigs/aws-load-balancer-controller/pull/3863

      The main issue is that NLBs do not terminate idle connections only evict the 5-tuple from the flow tracking. his causes any client to receive a TCP RST when using the connection subsequently.

      Related: https://aws.amazon.com/blogs/networking-and-content-delivery/introducing-nlb-tcp-configurable-idle-timeout/

      The second option is to reduce the tunnel timeout default (<350) when the load balancer type is NLB until aws-load-balancer-listener-attributes is available. This will cause HAProxy to close established persistent connections before NLB drops the flow, effectively signaling the client to reopen the connection.

      Version-Release number of selected component (if applicable)

      4.x

      How reproducible

      Steps to Reproduce

      1. Deploy any OCP that uses NLB by default (e.g. ROSA HCP).
      2. Idle a connection via ingress for more then 350s.
      3.Attempt to send request.

      Actual results

      Client receives a TCP reset when sending a request.

      Expected results

      The connection from the client should be closed by the remote end.

              mmasters1@redhat.com Miciah Masters
              rhn-support-tidawson Tim Dawson
              None
              None
              Hongan Li Hongan Li
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: