Uploaded image for project: 'ModeShape'
  1. ModeShape
  2. MODE-1347

Modeshape stuck - race condition in RepositoryConnectionPool

XMLWordPrintable

      Setup: jboss as 6, latest modeshape.

      Started 30 requests in parallel in got stuck (nothing is happening)

      Taking a threads dump and looking at the code, it seems to me that we have a race condition in RepositoryConnectionPool.

      When we try to get a connection we do first (only the interesting part is shown):

      mainLock.lock();
      
      // Peek to see if there is a connection available ...
      else if (this.availableConnections.peek() != null) {
          // There is, so take it and return it ...
          try {
              connection = this.availableConnections.take();
          } catch (InterruptedException e) {
              LOGGER.trace("Cancelled obtaining a repository connection from pool {0}", getSourceName());
              Thread.interrupted();
              throw new RepositorySourceException(getSourceName(), e);
          }
      }
      

      The race condition is between the 'peek' and the 'take'.

      Reason is further down the same method we do:

      if (connection == null) {
         // There are not enough connections, so wait in line for the
         // next available connection ...
         LOGGER.trace("Waiting for a repository connection from pool {0}", getSourceName());
         try {
             connection = this.availableConnections.take();
         } catch (InterruptedException e) {
             LOGGER.trace("Cancelled obtaining a repository connection from pool {0}", getSourceName());
             Thread.interrupted();
             throw new RepositorySourceException(getSourceName(), e);
         }
      mainLock = this.mainLock;
      mainLock.lock();
      

      So we call 'take' here without holding the mainLock.

      And this is IMO what happened.

      I have a thread that is blocked in the first 'take' (after the 'peek') and another one that is stuck while trying to get the lock after calling 'take'.

      This is this 'take' that went between the 'peek' and the 'take'.

      Note I have 9 other threads that are trying to release the connections but are also blocked was tryng to get the lock and
      this also explains why the first thread is stuck in 'take' as all the connections are taken.

              hchiorean Horia Chiorean (Inactive)
              jamat Juan AMAT (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: