Uploaded image for project: 'WildFly'
  1. WildFly
  2. WFLY-13357

(Regression) Execution of concurrent batch jobs containg partitioned steps causes deadlock

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Blocker Blocker
    • 20.0.0.Final
    • 19.0.0.Final
    • Batch
    • None

      Hello,

      the issue described in JBERET-180 seems to have reappeared. I am running Wildfly 16 with jberet-1.3.3. Given that there is a default batch-thread count of 10 I was able to produce a deadlock by starting 10 instances of a partitioned job simultaneously. None of the job runs fast enough to finish before all 10 jobs have been started. All 10 Batch-threads are stuck here:

      "Batch Thread - 1@33537" prio=5 tid=0x109 nid=NA waiting
        java.lang.Thread.State: WAITING
      	  at jdk.internal.misc.Unsafe.park(Unknown Source:-1)
      	  at java.util.concurrent.locks.LockSupport.park(Unknown Source:-1)
      	  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown Source:-1)
      	  at java.util.concurrent.ArrayBlockingQueue.take(Unknown Source:-1)
      	  at org.jberet.runtime.runner.StepExecutionRunner.beginPartition(StepExecutionRunner.java:350)
      	  at org.jberet.runtime.runner.StepExecutionRunner.runBatchletOrChunk(StepExecutionRunner.java:222)
      	  at org.jberet.runtime.runner.StepExecutionRunner.run(StepExecutionRunner.java:144)
      	  at org.jberet.runtime.runner.CompositeExecutionRunner.runStep(CompositeExecutionRunner.java:164)
      	  at org.jberet.runtime.runner.CompositeExecutionRunner.runFromHeadOrRestartPoint(CompositeExecutionRunner.java:88)
      	  at org.jberet.runtime.runner.JobExecutionRunner.run(JobExecutionRunner.java:60)
      	  at org.wildfly.extension.batch.jberet.deployment.BatchEnvironmentService$WildFlyBatchEnvironment$1.run(BatchEnvironmentService.java:180)
      	  at org.wildfly.extension.requestcontroller.RequestController$QueuedTask$1.run(RequestController.java:494)
      	  at org.jberet.spi.JobExecutor$2.run(JobExecutor.java:149)
      	  at org.jberet.spi.JobExecutor$1.run(JobExecutor.java:99)
      	  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source:-1)
      	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source:-1)
      	  at java.lang.Thread.run(Unknown Source:-1)
      	  at org.jboss.threads.JBossThread.run(JBossThread.java:485)
      

      which is this line of code:

      completedPartitionThreads.take();
      

      Rarely some threads also get stuck at line 364 instead, which is

      final Serializable data = collectorDataQueue.take();
      

              cfang@redhat.com Cheng Fang
              felk_ Felix König (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: