Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-11818

Need improvement of jBeret so that a job failure by Error does not cause failures of subsequent jobs.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • 7.1.0.ER3
    • 7.0.0.GA, 7.0.6.GA
    • Batch
    • None
    • ER3

    Description

      Once a job fails by java.lang.Error (including Errors extended from java.lang.Error) like (*1) on a batch thread, if a subsequent job is assigned to the same thread and runs on it, the subsequent job fails with JBERET error messages like (*2).
      In case a job fails by Exception on a batch thread, the transaction of the job is rolled-back successfully, so the next job on the same thread is processed successfully. However, in case a job fails by Error, the transaction is neither committed nor rolled-back, so the next job assigned to the same thread is not processed because the thread is still associated with the transaction.

      Improvement is requested so that a job failure by Error does not cause failures of subsequent jobs. The customer understand (*3). However, they are requesting improvements for at least Errors affecting only the job/application (e.g. NoSuchMethodError, LinkageError, AnnotationFormatError, AssertionError, etc.) though it might be difficult to improve for Errors affecting the entire java process such as OutOfMemoryError.

      Attached sample program batchtimeout.jar does:

      1. 10 bad jobs are assigned to 10 batch threads.
      2. All 10 bad jobs fail by NoSuchMethodError thrown from the sample program.
      3. Next, 10 good jobs are assigned to the 10 batch threads.
      4. All 10 good jobs fail with JBERET error message.

      Test Procesure:

      1. When EAP 7 server is running, copy batchtimeout.jar to deployments directory.
        $ cp batchtimeout.jar ${jboss.server.base.dir}/deployments
        
      2. Automatically start running batchtimeout.jar and you can see org.jberet ERROR messages like (*1), (*2).

      (*1)

      2017-06-21 11:23:49,792 ERROR [org.jberet] (Batch Thread - 5) JBERET000007: Failed to run job badjob, step1, org.jberet.job.model.Chunk@6674e4f1: java.lang.NoSuchMethodError
              at batch.TestReader.readItem(TestReader.java:16)
              at org.jberet.runtime.runner.ChunkRunner.readItem(ChunkRunner.java:359)
              at org.jberet.runtime.runner.ChunkRunner.readProcessWriteItems(ChunkRunner.java:305)
              at org.jberet.runtime.runner.ChunkRunner.run(ChunkRunner.java:201)
              at org.jberet.runtime.runner.StepExecutionRunner.runBatchletOrChunk(StepExecutionRunner.java:226)
              at org.jberet.runtime.runner.StepExecutionRunner.run(StepExecutionRunner.java:147)
              at org.jberet.runtime.runner.CompositeExecutionRunner.runStep(CompositeExecutionRunner.java:164)
              at org.jberet.runtime.runner.CompositeExecutionRunner.runFromHeadOrRestartPoint(CompositeExecutionRunner.java:88)
              at org.jberet.runtime.runner.JobExecutionRunner.run(JobExecutionRunner.java:60)
              at org.wildfly.extension.batch.jberet.impl.BatchEnvironmentService$WildFlyBatchEnvironment$1.run(BatchEnvironmentService.java:243)
              at org.wildfly.extension.requestcontroller.RequestController$QueuedTask$1.run(RequestController.java:497)
              at org.jberet.spi.JobExecutor$3.run(JobExecutor.java:161)
              at org.jberet.spi.JobExecutor$1.run(JobExecutor.java:99)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
              at org.jboss.threads.JBossThread.run(JBossThread.java:320)
      

      (*2)

      2017-06-21 11:23:56,849 ERROR [org.jberet] (Batch Thread - 5) JBERET000007: Failed to run job goodjob, step1, org.jberet.job.model.Chunk@1a1e6ee: javax.transaction.NotSupportedException: BaseTransaction.checkTransactionState - ARJUNA016051: thread is already associated with a transaction!
              at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.begin(BaseTransaction.java:73)
              at com.arjuna.ats.jbossatx.BaseTransactionManagerDelegate.begin(BaseTransactionManagerDelegate.java:78)
              at org.jberet.runtime.runner.ChunkRunner.run(ChunkRunner.java:187)
              at org.jberet.runtime.runner.StepExecutionRunner.runBatchletOrChunk(StepExecutionRunner.java:226)
              at org.jberet.runtime.runner.StepExecutionRunner.run(StepExecutionRunner.java:147)
              at org.jberet.runtime.runner.CompositeExecutionRunner.runStep(CompositeExecutionRunner.java:164)
              at org.jberet.runtime.runner.CompositeExecutionRunner.runFromHeadOrRestartPoint(CompositeExecutionRunner.java:88)
              at org.jberet.runtime.runner.JobExecutionRunner.run(JobExecutionRunner.java:60)
              at org.wildfly.extension.batch.jberet.impl.BatchEnvironmentService$WildFlyBatchEnvironment$1.run(BatchEnvironmentService.java:243)
              at org.wildfly.extension.requestcontroller.RequestController$QueuedTask$1.run(RequestController.java:497)
              at org.jberet.spi.JobExecutor$3.run(JobExecutor.java:161)
              at org.jberet.spi.JobExecutor$1.run(JobExecutor.java:99)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
              at org.jboss.threads.JBossThread.run(JBossThread.java:320)
      Caused by: java.lang.IllegalStateException: BaseTransaction.checkTransactionState - ARJUNA016051: thread is already associated with a transaction!
              at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.checkTransactionState(BaseTransaction.java:264)
              at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.begin(BaseTransaction.java:68)
              ... 15 more
      

      (*3) Error (Java Platform SE 8)
      https://docs.oracle.com/javase/8/docs/api/java/lang/Error.html
      An Error is a subclass of Throwable that indicates serious problems that a reasonable application should not try to catch.

      Background:

      The customer deploy a lot of applications. They frequently modify method signatures in application classes (e.g. add new arguments, etc.) to add/modify application features. However, operation errors sometimes happen, for example, an operator/deployer redeploys jar files containing modified classes but forgets to redeploy caller application depending on the jar files. In such case, a job failure with NoSuchMethodError occurs. (In the sample test case attached in this report, NoSuchMethodError is intentionally thrown, but actually their application does not throw it.) The customer does not see this job failure itself as a problem and they can resolve the NoSuchMethodError itself by redeploying the caller application correctly. However, they think it is a problem that the failure can cause failures of subsequent innocent jobs. They need to restart EAP server immediately because practically it is unable to function normally as a batch system, which has a big negative impact on service continuity.

      The customer's request is:

      • Improve jBeret/EAP so that a job failure by Error does not cause failures of subsequent jobs.
      • In particular, they are requesting improvement that when an Error affecting only the job/application (e.g. NoSuchMethodError, LinkageError, AnnotationFormatError, AssertionError, etc.) occurs in a job, jBeret disassociates the transaction for the job from the thread (rollbacks the transaction) so that subsequent jobs assigned to the same thread can normally be processed.

      Attachments

        1. batchtimeout.jar
          6 kB
        2. batchtimeout-processor.jar
          8 kB
        3. batchtimeout-writer.jar
          7 kB
        4. jberet-tx-after-error.png
          jberet-tx-after-error.png
          247 kB

        Issue Links

          Activity

            People

              cfang@redhat.com Cheng Fang
              rhn-support-myoshida Masato Yoshida
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: