Uploaded image for project: 'Undertow'
  1. Undertow
  2. UNDERTOW-2655

Text corruption with multi-byte characters split across buffer boundaries in FileUtils.readFile

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 2.3.11.Final, 2.3.13.Final, 2.3.12.Final, 2.3.14.Final, 2.3.15.Final, 2.3.16.Final, 2.3.19.Final, 2.3.17.Final, 2.3.18.Final, 2.4.0.Alpha1, 2.3.20.Final
    • Core
    • None
    • Hide

      1. Create a large payload: Construct a string using a multi-byte character repeated enough times to ensure it will be split across multiple 1024-byte buffers.

      • For example, use the Korean character '한' (a 3-byte character in UTF-8) repeated 6,000 times. The total payload size will be 18,000 bytes, which exceeds the 16 KiB file threshold from UNDERTOW-2337.

      2. Send the request: Send this string as the value of a field in a multipart/form-data POST request to an Undertow server.

      3. Process the data: On the server, read the value of the form field using `FormData#getValue()`. This will trigger the use of `FileUtils.readFile` for the temporary file.

      Show
      1. Create a large payload: Construct a string using a multi-byte character repeated enough times to ensure it will be split across multiple 1024-byte buffers. For example, use the Korean character '한' (a 3-byte character in UTF-8) repeated 6,000 times. The total payload size will be 18,000 bytes, which exceeds the 16 KiB file threshold from UNDERTOW-2337 . 2. Send the request: Send this string as the value of a field in a multipart/form-data POST request to an Undertow server. 3. Process the data: On the server, read the value of the form field using `FormData#getValue()`. This will trigger the use of `FileUtils.readFile` for the temporary file.
    • Low

      The` io.undertow.util.FileUtils.readFile` method can cause text corruption when reading streams containing multi-byte characters (such as those in UTF-8).

      The root cause is that the method reads the `InputStream` into a fixed-size byte buffer (1024 bytes) and decodes each chunk independently. If a multi-byte character sequence is split across a buffer boundary, the decoder receives incomplete character data for that chunk, resulting in replacement characters in the final string.

      This bug has a more significant impact following the changes in UNDERTOW-2337, as large form-data field values are now processed by this vulnerable function. The issue was originally reported in the context of the Spring Framework under issue #35292

              flaviarnn Flavia Rainone
              yjcltplzpz67kkcrlkydhlsvfbqdznaahkfokfztv04uq Jaeon Park
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: