Uploaded image for project: 'Undertow'
  1. Undertow
  2. UNDERTOW-1521

XNIO IO worker 100% CPU during SSL handshake

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Duplicate Issue
    • 2.0.19.Final
    • 2.0.20.Final
    • SSL
    • None
    • This issue is security relevant
    • Hide

      The connections in question were likely created by mobile devices entering power-save/sleep modes or loosing mobile network during/before handshake.

      To reproduce this I created a specialized test-case from the versatile ssltest project. The relevant fork can be found here:
      https://github.com/lyind/ssltest

      To reproduce against any server:
      1. GIT clone and build https://github.com/lyind/ssltest
      2. Execute java -jar ssltest.jar -connectonly TEST_SERVER TEST_PORT
      3. Watch Undertow eat CPU

      top -H -p UNDERTOW_VM_PID
      Show
      The connections in question were likely created by mobile devices entering power-save/sleep modes or loosing mobile network during/before handshake. To reproduce this I created a specialized test-case from the versatile ssltest project. The relevant fork can be found here: https://github.com/lyind/ssltest To reproduce against any server: 1. GIT clone and build https://github.com/lyind/ssltest 2. Execute java -jar ssltest.jar -connectonly TEST_SERVER TEST_PORT 3. Watch Undertow eat CPU top -H -p UNDERTOW_VM_PID

    Description

      Determined which threads keep the CPU busy:
      top -n 1 -H -p 281

        PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
        567 daemon    20   0 6516924 808552  28188 R 99.9  5.0   2317:52 XNIO-1 I/O-6
        569 daemon    20   0 6516924 808552  28188 R 87.5  5.0   2867:36 XNIO-1 I/O-7
        563 daemon    20   0 6516924 808552  28188 R 81.2  5.0   2318:09 XNIO-1 I/O-4
        573 daemon    20   0 6516924 808552  28188 R 75.0  5.0   2317:54 XNIO-1 I/O-8
        297 daemon    20   0 6516924 808552  28188 S  6.2  5.0  18:01.68 VM Thread
        506 daemon    20   0 6516924 808552  28188 S  0.0  5.0  10:31.10 GC Thread#1
        557 daemon    20   0 6516924 808552  28188 S  0.0  5.0   0:01.94 XNIO-1 I/O-1
        565 daemon    20   0 6516924 808552  28188 S  0.0  5.0   0:02.36 XNIO-1 I/O-5
        574 daemon    20   0 6516924 808552  28188 S  0.0  5.0   0:00.49 XNIO-1 Accept
      

      Sampled stack trace of offending thread:
      kill -3 281

       "XNIO-1 I/O-6" #23 prio=5 os_prio=0 cpu=139084467.28ms elapsed=492105.10s tid=0x00007f39cc628000 nid=0x237 runnable  [0x00007f3995506000]
          java.lang.Thread.State: RUNNABLE
       	at sun.security.ssl.SSLEngineImpl.getHandshakeStatus(java.base@11.0.2/SSLEngineImpl.java:805)
       	- locked <0x00000000dbe884a8> (a sun.security.ssl.SSLEngineImpl)
       	at sun.security.ssl.SSLEngineImpl.writeRecord(java.base@11.0.2/SSLEngineImpl.java:251)
       	at sun.security.ssl.SSLEngineImpl.wrap(java.base@11.0.2/SSLEngineImpl.java:136)
       	- eliminated <0x00000000dbe884a8> (a sun.security.ssl.SSLEngineImpl)
       	at sun.security.ssl.SSLEngineImpl.wrap(java.base@11.0.2/SSLEngineImpl.java:116)
       	- locked <0x00000000dbe884a8> (a sun.security.ssl.SSLEngineImpl)
       	at javax.net.ssl.SSLEngine.wrap(java.base@11.0.2/SSLEngine.java:479)
       	at io.undertow.server.protocol.http.ALPNLimitingSSLEngine.wrap(ALPNLimitingSSLEngine.java:64)
       	at io.undertow.protocols.ssl.SslConduit.doWrap(SslConduit.java:900)
       	at io.undertow.protocols.ssl.SslConduit.doUnwrap(SslConduit.java:671)
       	at io.undertow.protocols.ssl.SslConduit.read(SslConduit.java:567)
       	at io.undertow.conduits.IdleTimeoutConduit.read(IdleTimeoutConduit.java:202)
       	at org.xnio.conduits.ConduitStreamSourceChannel.read(ConduitStreamSourceChannel.java:127)
       	at io.undertow.server.protocol.framed.AbstractFramedChannel.receive(AbstractFramedChannel.java:373)
       	- locked <0x00000000dbe92268> (a io.undertow.protocols.http2.Http2Channel)
       	at io.undertow.server.protocol.http2.Http2ReceiveListener.handleEvent(Http2ReceiveListener.java:118)
       	at io.undertow.server.protocol.http2.Http2ReceiveListener.handleEvent(Http2ReceiveListener.java:67)
       	at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92)
       	at io.undertow.server.protocol.framed.AbstractFramedChannel$FrameReadListener.handleEvent(AbstractFramedChannel.java:948)
       	at io.undertow.server.protocol.framed.AbstractFramedChannel$FrameReadListener.handleEvent(AbstractFramedChannel.java:929)
       	at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92)
       	at org.xnio.conduits.ReadReadyHandler$ChannelListenerHandler.readReady(ReadReadyHandler.java:66)
       	at io.undertow.protocols.ssl.SslConduit$SslReadReadyHandler.readReady(SslConduit.java:1175)
       	at io.undertow.protocols.ssl.SslConduit$SslWriteReadyHandler.writeReady(SslConduit.java:1250)
       	at org.xnio.nio.NioSocketConduit.handleReady(NioSocketConduit.java:93)
       	at org.xnio.nio.WorkerThread.run(WorkerThread.java:561)
       

      IO workers are looping here:

                  SSLEngineResult result = null;
                  while (result == null || (result.getHandshakeStatus() == SSLEngineResult.HandshakeStatus.NEED_WRAP && result.getStatus() != SSLEngineResult.Status.BUFFER_OVERFLOW)) {
                      if (userBuffers == null) {
                          result = engine.wrap(EMPTY_BUFFER, wrappedData.getBuffer());
                      } else {
                          result = engine.wrap(userBuffers, off, len, wrappedData.getBuffer());
                      }
                  }
                  wrappedData.getBuffer().flip();
      

      Put non-suspending, logging breakpoint at SslConduit.doWrap():899 (in the loop):
      Expression:
      Thread.currentThread().getName() + ": loop: result=" + result + ", state=" + state + ", (userBuffers == null)=" + (userBuffers == null)
      Debug console output:

      Connected to the target VM, address: 'localhost:9001', transport: 'socket'
      XNIO-1 I/O-7: loop: result=Status = OK HandshakeStatus = NEED_WRAP bytesConsumed = 0 bytesProduced = 0, state=2053, (userBuffers == null)=true, connection.delegate.state=0, connection.getSslSession().getCreationTime()=1554196781900, connection.getSslSession().getPeerHost()=nnn.nnn.14.229, connection.getSslSession().getPeerPort()=33143
      XNIO-1 I/O-8: loop: result=Status = OK HandshakeStatus = NEED_WRAP bytesConsumed = 0 bytesProduced = 0, state=2053, (userBuffers == null)=true, connection.delegate.state=0, connection.getSslSession().getCreationTime()=1554230056259, connection.getSslSession().getPeerHost()=nnn.nnn.10.132, connection.getSslSession().getPeerPort()=57761
      XNIO-1 I/O-6: loop: result=Status = OK HandshakeStatus = NEED_WRAP bytesConsumed = 0 bytesProduced = 0, state=2053, (userBuffers == null)=true, connection.delegate.state=0, connection.getSslSession().getCreationTime()=1554230060635, connection.getSslSession().getPeerHost()=nnn.nnn.10.132, connection.getSslSession().getPeerPort()=57323
      XNIO-1 I/O-4: loop: result=Status = OK HandshakeStatus = NEED_WRAP bytesConsumed = 0 bytesProduced = 0, state=2053, (userBuffers == null)=true, connection.delegate.state=0, connection.getSslSession().getCreationTime()=1554230052518, connection.getSslSession().getPeerHost()=nnn.nnn.10.132, connection.getSslSession().getPeerPort()=45853
      XNIO-1 I/O-7: loop: result=Status = OK HandshakeStatus = NEED_WRAP bytesConsumed = 0 bytesProduced = 0, state=2053, (userBuffers == null)=true, connection.delegate.state=0, connection.getSslSession().getCreationTime()=1554196781900, connection.getSslSession().getPeerHost()=nnn.nnn.14.229, connection.getSslSession().getPeerPort()=33143
      

      System.currentTimeMillies() at debugging time: 1554381122007
      Time since SSL Session creation of connection nnn.nnn.14.229:33143: ~51h
      Flags set in state: SslConduit.FLAG_READ_REQUIRES_WRITE | SslConduit.FLAG_READS_RESUMED | SslConduit.FLAG_IN_HANDSHAKE == 2053

      Netstat output for the relevant connections:
      netstat -tnp | egrep '33143|57761|57323|45853|33143'

      tcp        1      0 nnn.nnn.nnn.119:8443       nnn.nnn.10.132:45853    CLOSE_WAIT  281/java
      tcp        1      0 nnn.nnn.nnn.119:8443       nnn.nnn.10.132:57323    CLOSE_WAIT  281/java
      tcp        1      0 nnn.nnn.nnn.119:8443       nnn.nnn.10.132:57761    CLOSE_WAIT  281/java
      

      No timeouts seem to be honored for connections hit by this issue. I saw connections lasting many days and Undertows internal connection objects even outlive the actual TCP connection without changing state.

      I consider this a minor security issue as it seems to allow for DoS attacks against applications served via Undertow.

      Attachments

        Issue Links

          Activity

            People

              flaviarnn Flavia Rainone
              lyind Jonas Zeiger (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: