Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 4.18
Component/s: Networking / router
Labels:
- ne-triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
NI&D Sprint 270, NI&D Sprint 271, NI&D Sprint 272, NI&D Sprint 274, NI&D Sprint 278
sprint_count:
5

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
PX Priority Data:
PX Impact Score:
PX Technical Impact:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem

Currently the default idle timeout for NLB is 350s, but the ingresscontroller sets a tunnel timeout of 1h. This is causing customers to experience a high number of idle connection timeouts in HAProxy, and TCP resets on the client side.

We should better align the NLB TCP idle timeout with the HAProxy tunnel timeout as we do for Classic ELB, and support customers setting a custom timeout annotation for NLB to bring parity with Classic ELBs.

Required upstream feature: https://github.com/kubernetes-sigs/aws-load-balancer-controller/pull/3863

The main issue is that NLBs do not terminate idle connections only evict the 5-tuple from the flow tracking. his causes any client to receive a TCP RST when using the connection subsequently.

Related: https://aws.amazon.com/blogs/networking-and-content-delivery/introducing-nlb-tcp-configurable-idle-timeout/

—

The second option is to reduce the tunnel timeout default (<350) when the load balancer type is NLB until aws-load-balancer-listener-attributes is available. This will cause HAProxy to close established persistent connections before NLB drops the flow, effectively signaling the client to reopen the connection.

Version-Release number of selected component (if applicable)

4.x

How reproducible

Steps to Reproduce

1. Deploy any OCP that uses NLB by default (e.g. ROSA HCP).
2. Idle a connection via ingress for more then 350s.
3.Attempt to send request.

Actual results

Client receives a TCP reset when sending a request.

Expected results

The connection from the client should be closed by the remote end.

Assignee:: Miciah Masters

Reporter:: Tim Dawson

Need Info From:: None

Contributors:: None

QA Contact:: Hongan Li

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2025/04/08 3:24 AM

Updated:: 2025/10/10 11:04 AM

Details

Description

Description of problem

Version-Release number of selected component (if applicable)

How reproducible

Steps to Reproduce

Actual results

Expected results

Attachments

Easy Agile Planning Poker

Activity

People

Dates