Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: None
Affects Version/s: 4.14
Component/s: Networking / router
Labels:
- haproxy
- ingress
- ne-triaged
- router

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
2
Severity:
Important
Regression:
No

Target Backport Versions:

4.14.z, 4.15.z
Target Version:

4.15.z
Release Blocker:
Rejected
Sprint:
Sprint 253, Sprint 254
sprint_count:
2

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Priority Data:
PX Impact Score:

Release Note Status:
Done
Release Note Type:
Bug Fix
Release Note Text:

Hide
* Previously, in HAProxy 2.6 deployments on OpenShift, shutting down HAproxy could result in a race condition. The main thread `(tid=0)` would wait for other threads to complete, but some threads would enter an
infinite loop, consuming 100% CPU. With this release, the variable
controlling the loop's termination is now properly reset, preventing non-main threads from looping indefinitely. This ensures that the thread's poll loop can terminate correctly. (link:https://issues.redhat.com/browse/OCPBUGS-33883[*~~OCPBUGS-33883~~*])
______________
Previous Behaviour:

In HAProxy 2.6 deployments on OpenShift, shutting down haproxy
could result in a race condition. The main thread (tid=0) would wait
for other threads to complete, but some threads would enter an
infinite loop, consuming 100% CPU. This loop involved continuous
syscalls such as epoll_wait and clock_gettime.

Fixed Behaviour:

The issue has been addressed in the upstream issue
https://github.com/haproxy/haproxy/issues/2537 and we carry the fix in
the following RPM version: haproxy-2.6.13-3.rhaos4.15.el8.

The fix ensures that the thread's poll loop can terminate correctly. The variable
controlling the loop's termination is now properly reset, preventing non-main threads from looping indefinitely. This change ensures that the "stopping" part of the run_poll_loop() function is reached,
allowing all threads to exit cleanly during shutdown, except for the
main thread (tid=0), which handles signals.

Show
* Previously, in HAProxy 2.6 deployments on OpenShift, shutting down HAproxy could result in a race condition. The main thread `(tid=0)` would wait for other threads to complete, but some threads would enter an infinite loop, consuming 100% CPU. With this release, the variable controlling the loop's termination is now properly reset, preventing non-main threads from looping indefinitely. This ensures that the thread's poll loop can terminate correctly. (link: https://issues.redhat.com/browse/OCPBUGS-33883 [* OCPBUGS-33883 *]) ______________ Previous Behaviour: In HAProxy 2.6 deployments on OpenShift, shutting down haproxy could result in a race condition. The main thread (tid=0) would wait for other threads to complete, but some threads would enter an infinite loop, consuming 100% CPU. This loop involved continuous syscalls such as epoll_wait and clock_gettime. Fixed Behaviour: The issue has been addressed in the upstream issue https://github.com/haproxy/haproxy/issues/2537 and we carry the fix in the following RPM version: haproxy-2.6.13-3.rhaos4.15.el8. The fix ensures that the thread's poll loop can terminate correctly. The variable controlling the loop's termination is now properly reset, preventing non-main threads from looping indefinitely. This change ensures that the "stopping" part of the run_poll_loop() function is reached, allowing all threads to exit cleanly during shutdown, except for the main thread (tid=0), which handles signals.

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

This was originally observed on a haproxy 2.6 deployment in OpenShift.
When shutting down, we observe a race condition where the thread with tid=0 waits for other threads to complete, while some remaining threads will loop and take 100% CPU doing syscalls epoll_wait and clock_gettime

The customer has raised an github issue for haproxy behaviour.

https://github.com/haproxy/haproxy/issues/2537

Version-Release number of selected component (if applicable):

How reproducible:

    See upstream issue (ch)

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

links to

RHBA-2024:3488 OpenShift Container Platform 4.15.z bug fix update

Assignee:: Andrew McDermott

Reporter:: Bobby Mehra

Need Info From:: None

Contributors:: None

QA Contact:: Shudi Li

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/05/17 2:54 PM

Updated:: 2025/07/22 11:29 PM

Resolved:: 2024/06/05 3:21 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide