-
Enhancement
-
Resolution: Done
-
Major
-
7.3.3.GA
-
None
Garbled characters happen when EAP 7/Undertow's FormEncodedDataDefinition parses raw multibyte characters in POST request data. However, the garbled character issue does not happen on EAP 6 (JBossWeb) and Tomcat.
example application:
...(snip)... @WebServlet(name = "TestServlet", urlPatterns = {"/test"}) public class TestServlet extends HttpServlet { ...(snip)... protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { request.setCharacterEncoding("UTF-8"); response.setCharacterEncoding("UTF-8"); response.setContentType("text/html;charset=UTF-8"); try (PrintWriter out = response.getWriter()) { out.println("test = " + request.getParameter("test")); } }
result:
- EAP 7 (Undertow)
$ curl http://localhost:8080/example/test -d "test=テスト" test =  ̄テニ ̄ツᄍ ̄テネ
- EAP 6 (JBossWeb) and JWS 5 Tomcat 9
$ curl http://localhost:8080/example/test -d "test=テスト" test = テスト
—
IMO, as per HTTP specification, the client should use percent-encode (url-encode) when sending POST data as "Content-Type: application/x-www-form-urlencoded". Basically, web browsers (HTTP compliant clients) correctly uses percent-encode (url-encode) when sending POST data. So, I think the root cause of this issue is the client's wrong behavior. In fact, this garbled character issue does not happen when using the following curl command:
$ curl localhost:8080/test/ -d "test=%E3%83%86%E3%82%B9%E3%83%88"
However, I think we can improve Undertow's FormEncodedDataDefinition slightly to become more compatible with EAP 6/JBossWeb and not to break raw multibyte characters in some cases.
Here's an analysis of implementation differences between EAP 6 and EAP 7:
- EAP 6/JBossWeb (org.apache.tomcat.util.http.Parameters#processParameters) parses parameters as byte, then it finally converts byte array of key/value to String through ByteChunk.
- EAP 7/Undertow (io.undertow.server.handlers.form.FormEncodedDataDefinition#doParse) also parses parameters as byte. However, it directly converts each one byte to one char by using "StringBuilder#append((char) n)" where n is one byte. As multibyte characters (like Japanese characters) can not be correctly converted to one character from one byte, this processing results in a garbled character.
So, I think we can improve FormEncodedDataDefinition by using ByteArrayOutputStream instead of StringBuilder: https://github.com/undertow-io/undertow/compare/master...msfm:master_UNDERTOW-1802
Of course, if the parameter value contains both % and raw multibyte characters, the parameter still can not be parsed correctly because the value can not be decoded as a percent-encoded value correctly. However, I think this is not an issue because it's the same as EAP 6/JBossWeb.
- is cloned by
-
UNDERTOW-1802 Improve FormEncodedDataDefinition to handle chars in configured encoding
- Resolved
- is incorporated by
-
JBEAP-20288 [GSS] (7.3.z) Upgrade undertow from 2.0.32.SP1-redhat to 2.0.33.SP2-redhat
- Closed