-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.18
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
Rejected
-
NI&D Sprint 270, NI&D Sprint 271, NI&D Sprint 272, NI&D Sprint 274, NI&D Sprint 278
-
5
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem
Currently the default idle timeout for NLB is 350s, but the ingresscontroller sets a tunnel timeout of 1h. This is causing customers to experience a high number of idle connection timeouts in HAProxy, and TCP resets on the client side.
We should better align the NLB TCP idle timeout with the HAProxy tunnel timeout as we do for Classic ELB, and support customers setting a custom timeout annotation for NLB to bring parity with Classic ELBs.
Required upstream feature: https://github.com/kubernetes-sigs/aws-load-balancer-controller/pull/3863
The main issue is that NLBs do not terminate idle connections only evict the 5-tuple from the flow tracking. his causes any client to receive a TCP RST when using the connection subsequently.
—
The second option is to reduce the tunnel timeout default (<350) when the load balancer type is NLB until aws-load-balancer-listener-attributes is available. This will cause HAProxy to close established persistent connections before NLB drops the flow, effectively signaling the client to reopen the connection.
Version-Release number of selected component (if applicable)
4.x
How reproducible
Steps to Reproduce
1. Deploy any OCP that uses NLB by default (e.g. ROSA HCP).
2. Idle a connection via ingress for more then 350s.
3.Attempt to send request.
Actual results
Client receives a TCP reset when sending a request.
Expected results
The connection from the client should be closed by the remote end.