Uploaded image for project: 'Teiid'
  1. Teiid
  2. TEIID-1058

SessionCleanupThread causes deadlocks when server is clustered

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Minor
    • Resolution: Out of Date
    • None
    • 8.0
    • Server
    • None

    Description

      Note: The following problem exists in MetaMatrix 5.5.x. I am creating this Teiid JIRA so that a fix for it (if necessary) can be considered in that product. There are no plans to fix it in MetaMatrix 5.5.x.

      When SessionID cleanup is done, either at a scheduled interval or at server shutdown, there will sometimes be deadlocks if the MetaMatrix servers are clustered and/or have multiple processes.

      This is because all of the processes will do the full cleanup on whatever inactive and aged sessionids are in the database. And each of them uses a single transaction to do all of the deletes, no matter how many sessionids need to be deleted.

      This is hugely redundant and subject to deadlocks. If we had a concept of a 'lead' host in a cluster, it could be assigned this task, but we don't.

      Possible solutions:
      1) Adding an ORDER BY to the SELECT we use to retrieve the old sessionids.  This forces the processes to delete the sessionids in the same order, reducing the likelihood of true deadlocks. However, in testing it did not prevent the deadlocks.

      2) Refactor the cleanup code so that it does a smaller amount of work per transaction.  Currently each host will attempt to do the entire cleanup process in a single transaction.  With a default TTL of 10 hours, that could be hundreds or thousands of deletable sessionids in a busy environment.  The process could be rewritten to discover/delete/commit a few (say 10, or even just 1) at a time.

      3) Add "FOR UPDATE NOWAIT" to the SELECT that is used to retrieve the sessionids that are ready to delete.
      But this syntax is only legal for Oracle and PostgreSQL. So that may not be such good solution.

      We have decided not to pursue this fix in 5.5.4. Whether the conditions will be the same in Teiid is not clear, but if they are this note is offered as a contribution toward solving it in Teiid.

      Attachments

        Activity

          People

            rhn-engineering-shawkins Steven Hawkins
            ghelblin Jerry Helbling (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: