Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-59017

random failures in other requests on http/2 stream when client resets one request

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • rhel-8.10
    • mod_http2
    • None
    • No
    • None
    • rhel-sst-cs-stacks
    • ssg_core_services
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • x86_64
    • None

      What were you trying to do that didn't work?

      If a client aborts 1 request on an http2 stream, this triggers some DoS protections in http2 as the reset on the session is handled here:
      https://github.com/icing/mod_h2/blob/v1.15.19/mod_http2/h2_session.c#L402
      https://github.com/icing/mod_h2/blob/v1.15.19/mod_http2/h2_mplx.c#L1143
      And this calls m_be_annoyed:
      https://github.com/icing/mod_h2/blob/v1.15.19/mod_http2/h2_mplx.c#L987
      The first time this occurs, that drops the max tasks allowed on this stream to a max of 16 and lower for subsequent invocations. It then also unschedules any current tasks over that lowered limit, which then produces failures in any requests unscheduled as a result (browser throws ERR_HTTP2_PROTOCOL_ERROR). The unscheduled tasks following the reset are shown with http2 trace logging like below:

      [Tue Sep 10 17:05:20.314728 2024] [http2:debug] [pid 1768919:tid 1769073] h2_session.c(347): [client 127.0.0.1:57224] AH03066: h2_session(170,BUSY,20): recv FRAME[RST_STREAM[length=4, flags=0, stream=351]], frames=248/2051 (r/s)
      [Tue Sep 10 17:05:20.314735 2024] [http2:debug] [pid 1768919:tid 1769073] h2_session.c(396): [client 127.0.0.1:57224] AH03067: h2_stream(170-351): RST_STREAM by client, error=8
      [Tue Sep 10 17:05:20.314742 2024] [http2:trace1] [pid 1768919:tid 1769073] h2_stream.c(302): [client 127.0.0.1:57224] h2_stream(170-351,HALF_CLOSED_REMOTE): transit to [CLOSED]
      [Tue Sep 10 17:05:20.314748 2024] [http2:trace1] [pid 1768919:tid 1769073] h2_stream.c(211): [client 127.0.0.1:57224] h2_stream(170-351,CLOSED): closing input
      [Tue Sep 10 17:05:20.314755 2024] [http2:trace2] [pid 1768919:tid 1769073] h2_session.c(1976): [client 127.0.0.1:57224] h2_stream(170-351,CLOSED): entered state
      [Tue Sep 10 17:05:20.314764 2024] [http2:trace1] [pid 1768919:tid 1769073] h2_mplx.c(1002): [client 127.0.0.1:57224] h2_mplx(170): mood update, decreasing worker limit to 16
      [Tue Sep 10 17:05:20.314800 2024] [http2:trace2] [pid 1768919:tid 1769073] h2_mplx.c(949): [client 127.0.0.1:57224] h2_mplx(184-379): unschedule, resetting task for redo later
      [Tue Sep 10 17:05:20.314810 2024] [http2:trace2] [pid 1768919:tid 1769073] h2_mplx.c(949): [client 127.0.0.1:57224] h2_mplx(184-377): unschedule, resetting task for redo later
      [Tue Sep 10 17:05:20.314820 2024] [http2:trace2] [pid 1768919:tid 1769073] h2_mplx.c(949): [client 127.0.0.1:57224] h2_mplx(184-375): unschedule, resetting task for redo later
      

      We need the following fix backport that removes that task unscheduling:

      https://github.com/apache/httpd/pull/317/commits/ae6dedd0acc246a4ce6be2fa8fc64b7d1851bdae#diff-3e0956e7d573ccb745c0377b22eb6b15d6599c14cd04643b99837d4b946adac2

      What is the impact of this issue to you?

      Seemingly random HTTP/2 failures

      Please provide the package NVR for which the bug is seen:

      1.15.7-10.module+el8.10.0+21653+eaff63f0

      How reproducible is this bug?:

      Consistently in some high concurrent request traffic involving a client reset

      Expected results

      No other request failures

      Actual results

      Other concurrent requests can fail with a client's reset

              luhliari@redhat.com Lubos Uhliarik
              rhn-support-aogburn Aaron Ogburn
              Lubos Uhliarik Lubos Uhliarik
              rhel-cs-infra-services-qe rhel-cs-infra-services-qe rhel-cs-infra-services-qe rhel-cs-infra-services-qe
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: