-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
rhel-8.10
-
None
-
No
-
None
-
rhel-sst-cs-stacks
-
ssg_core_services
-
None
-
False
-
-
None
-
None
-
None
-
None
-
-
x86_64
-
None
What were you trying to do that didn't work?
If a client aborts 1 request on an http2 stream, this triggers some DoS protections in http2 as the reset on the session is handled here:
https://github.com/icing/mod_h2/blob/v1.15.19/mod_http2/h2_session.c#L402
https://github.com/icing/mod_h2/blob/v1.15.19/mod_http2/h2_mplx.c#L1143
And this calls m_be_annoyed:
https://github.com/icing/mod_h2/blob/v1.15.19/mod_http2/h2_mplx.c#L987
The first time this occurs, that drops the max tasks allowed on this stream to a max of 16 and lower for subsequent invocations. It then also unschedules any current tasks over that lowered limit, which then produces failures in any requests unscheduled as a result (browser throws ERR_HTTP2_PROTOCOL_ERROR). The unscheduled tasks following the reset are shown with http2 trace logging like below:
[Tue Sep 10 17:05:20.314728 2024] [http2:debug] [pid 1768919:tid 1769073] h2_session.c(347): [client 127.0.0.1:57224] AH03066: h2_session(170,BUSY,20): recv FRAME[RST_STREAM[length=4, flags=0, stream=351]], frames=248/2051 (r/s) [Tue Sep 10 17:05:20.314735 2024] [http2:debug] [pid 1768919:tid 1769073] h2_session.c(396): [client 127.0.0.1:57224] AH03067: h2_stream(170-351): RST_STREAM by client, error=8 [Tue Sep 10 17:05:20.314742 2024] [http2:trace1] [pid 1768919:tid 1769073] h2_stream.c(302): [client 127.0.0.1:57224] h2_stream(170-351,HALF_CLOSED_REMOTE): transit to [CLOSED] [Tue Sep 10 17:05:20.314748 2024] [http2:trace1] [pid 1768919:tid 1769073] h2_stream.c(211): [client 127.0.0.1:57224] h2_stream(170-351,CLOSED): closing input [Tue Sep 10 17:05:20.314755 2024] [http2:trace2] [pid 1768919:tid 1769073] h2_session.c(1976): [client 127.0.0.1:57224] h2_stream(170-351,CLOSED): entered state [Tue Sep 10 17:05:20.314764 2024] [http2:trace1] [pid 1768919:tid 1769073] h2_mplx.c(1002): [client 127.0.0.1:57224] h2_mplx(170): mood update, decreasing worker limit to 16 [Tue Sep 10 17:05:20.314800 2024] [http2:trace2] [pid 1768919:tid 1769073] h2_mplx.c(949): [client 127.0.0.1:57224] h2_mplx(184-379): unschedule, resetting task for redo later [Tue Sep 10 17:05:20.314810 2024] [http2:trace2] [pid 1768919:tid 1769073] h2_mplx.c(949): [client 127.0.0.1:57224] h2_mplx(184-377): unschedule, resetting task for redo later [Tue Sep 10 17:05:20.314820 2024] [http2:trace2] [pid 1768919:tid 1769073] h2_mplx.c(949): [client 127.0.0.1:57224] h2_mplx(184-375): unschedule, resetting task for redo later
We need the following fix backport that removes that task unscheduling:
What is the impact of this issue to you?
Seemingly random HTTP/2 failures
Please provide the package NVR for which the bug is seen:
1.15.7-10.module+el8.10.0+21653+eaff63f0
How reproducible is this bug?:
Consistently in some high concurrent request traffic involving a client reset
Expected results
No other request failures
Actual results
Other concurrent requests can fail with a client's reset