Uploaded image for project: 'WildFly'
  1. WildFly
  2. WFLY-13438

Database Timers: instances sporadically fail to start in a cluster

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 19.1.0.Final
    • Fix Version/s: None
    • Component/s: EJB
    • Labels:
      None
    • Steps to Reproduce:
      Hide
      • Wildfly in domain mode
      • configure database persistence for timers (see above)
      • create a server-group with three instances
      • deploy a WAR containing a stateless Bean with a `@Schedule` method as described above (persistent timer)
      • restart the server-group various times: after restarting 5 to 10 times the error should occur (it's what it takes for us)
      Show
      Wildfly in domain mode configure database persistence for timers (see above) create a server-group with three instances deploy a WAR containing a stateless Bean with a `@Schedule` method as described above (persistent timer) restart the server-group various times: after restarting 5 to 10 times the error should occur (it's what it takes for us)

      Description

      We use a postgresql database for persistent timers, to control the execution of scheduled tasks in a cluster. It mainly works, but when operating in a cluster and when a server-group is restarted, sometimes individual instances in the cluster fail to start because of an error when reinstating persistent timers.

      Configuration of the Datasource:

      <xa-datasource jndi-name="java:/datasources/MonitoringDS" pool-name="MonitoringDS" enabled="true">
          <xa-datasource-property name="ServerName">
              appserver.local
          </xa-datasource-property>
          <xa-datasource-property name="PortNumber">
              5432
          </xa-datasource-property>
          <xa-datasource-property name="DatabaseName">
              monitoringdb
          </xa-datasource-property>
          <xa-datasource-class&gt;org.postgresql.xa.PGXADataSource</xa-datasource-class&gt;
          <driver>postgresql</driver>
          <security>
              <user-name>xxx</user-name>
              <password>xxx</password>
          </security>
      </xa-datasource>
      

      Configuration of the timer service:

      <timer-service thread-pool-name="default" default-data-store="batch-clustered-store">
          <data-stores>
              <file-data-store name="default-file-store" path="timer-service-data" relative-to="jboss.server.data.dir"/>
              <database-data-store name="batch-clustered-store" datasource-jndi-name="java:/datasources/MonitoringDS" database="'postgresql'" partition="${batch.partition}" refresh-interval="60000" allow-execution="true"/>
          </data-stores>
      </timer-service>
      

      Note we use a system-property on the server-group to determine the partition name.

      Also, we made NO changes to the SQL statements in "timer-sql.properties".

      The timer is generated by a `@Schedule` annotation in an EJB like this:

          @Schedule(
                  dayOfMonth = "${manual.scheduler.dayofmonth:*}",
                  dayOfWeek = "${manual.scheduler.dayofweek:*}",
                  hour = "${manual.scheduler.hour:2/6}",
                  minute = "${manual.scheduler.minute:2}"
          )
          @TransactionAttribute(TransactionAttributeType.NEVER)
          void doPostboxImport() {
              log.info("Starting Postbox Import");
              BatchRuntime.getJobOperator().start("PostboxImportBatchlet", null);
          }
      

      The error we sporadically get when starting more than one instance at the same time (when all instances try to reinstate timers) is like this:

      14:48:28,117 WARN  [com.arjuna.ats.arjuna] (ServerService Thread Pool -- 69) {"tenant": "", "clientId": ""} ARJUNA012078: Abort called illegaly on atomic action 0:ffff7f000001:-27f2652e:5eb2b20f:f
      14:48:28,118 WARN  [org.jboss.as.ejb3.timer] (ServerService Thread Pool -- 69) {"tenant": "", "clientId": ""} WFLYEJB0161: Failed to reinstate timer 'postbox-integration-service.postbox-integration-service.PostboxImport' (id=unavailable) from its persistent state: javax.transaction.NotSupportedException: WFTXN0001: A transaction is already in progress
              at org.wildfly.transaction.client@1.1.9.Final//org.wildfly.transaction.client.ContextTransactionManager.begin(ContextTransactionManager.java:60)
              at org.wildfly.transaction.client@1.1.9.Final//org.wildfly.transaction.client.ContextTransactionManager.begin(ContextTransactionManager.java:54)
              at org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.timerservice.TimerServiceImpl.getActivePersistentTimers(TimerServiceImpl.java:984)
              at org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.timerservice.TimerServiceImpl.restoreTimers(TimerServiceImpl.java:712)
              at org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.timerservice.TimerServiceImpl.activate(TimerServiceImpl.java:223)
              at org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.component.EJBComponent.init(EJBComponent.java:595)
              at org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.component.stateless.StatelessSessionComponent.init(StatelessSessionComponent.java:110)
              at org.jboss.as.ee@19.1.0.Final//org.jboss.as.ee.component.BasicComponent.start(BasicComponent.java:222)
              at org.jboss.as.ee@19.1.0.Final//org.jboss.as.ee.component.ComponentStartService$1.run(ComponentStartService.java:54)
              at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
              at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
              at org.jboss.threads@2.3.3.Final//org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
              at org.jboss.threads@2.3.3.Final//org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1982)
              at org.jboss.threads@2.3.3.Final//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
              at org.jboss.threads@2.3.3.Final//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1377)
              at java.base/java.lang.Thread.run(Thread.java:834)
              at org.jboss.threads@2.3.3.Final//org.jboss.threads.JBossThread.run(JBossThread.java:485)
      
      14:48:28,153 ERROR [org.jboss.msc.service.fail] (ServerService Thread Pool -- 69) {"tenant": "", "clientId": ""} MSC000001: Failed to start service jboss.deployment.unit."postbox-integration-service-E-SNAPSHOT.war".component.PostboxImport.START: org.jboss.msc.service.StartException in service jboss.deployment.unit."postbox-integration-service-E-SNAPSHOT.war".component.PostboxImport.START: java.lang.RuntimeException: java.lang.NullPointerException
              at org.jboss.as.ee@19.1.0.Final//org.jboss.as.ee.component.ComponentStartService$1.run(ComponentStartService.java:57)
              at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
              at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
              at org.jboss.threads@2.3.3.Final//org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
              at org.jboss.threads@2.3.3.Final//org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1982)
              at org.jboss.threads@2.3.3.Final//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
              at org.jboss.threads@2.3.3.Final//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1377)
              at java.base/java.lang.Thread.run(Thread.java:834)
              at org.jboss.threads@2.3.3.Final//org.jboss.threads.JBossThread.run(JBossThread.java:485)
      Caused by: java.lang.RuntimeException: java.lang.NullPointerException
              at org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.timerservice.TimerServiceImpl.persistTimer(TimerServiceImpl.java:626)
              at org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.timerservice.TimerServiceImpl.createCalendarTimer(TimerServiceImpl.java:537)
              at org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.timerservice.TimerServiceImpl.loadAutoTimer(TimerServiceImpl.java:388)
              at org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.timerservice.TimerServiceImpl.restoreTimers(TimerServiceImpl.java:774)
              at org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.timerservice.TimerServiceImpl.activate(TimerServiceImpl.java:223)
              at org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.component.EJBComponent.init(EJBComponent.java:595)
              at org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.component.stateless.StatelessSessionComponent.init(StatelessSessionComponent.java:110)
              at org.jboss.as.ee@19.1.0.Final//org.jboss.as.ee.component.BasicComponent.start(BasicComponent.java:222)
              at org.jboss.as.ee@19.1.0.Final//org.jboss.as.ee.component.ComponentStartService$1.run(ComponentStartService.java:54)
              ... 8 more
      Caused by: java.lang.NullPointerException
              at org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.timerservice.persistence.database.DatabaseTimerPersistence.addTimer(DatabaseTimerPersistence.java:336)
              at org.jboss.as.ejb3@19.1.0.Final//org.jboss.as.ejb3.timerservice.TimerServiceImpl.persistTimer(TimerServiceImpl.java:607)
              ... 16 more
      

      The error seems similar to the one described in WFLY-13386, but the stack trace and the steps to reproduce it seem different to me, so I'm not sure if it's the same problem. The problem also existed in Wildfly 14 , we just updated to 19.1 (also hoping that his error would be corrected).

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  cfang Cheng Fang
                  Reporter:
                  marcos-scholtz Marcos Scholtz
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  5 Start watching this issue

                  Dates

                  • Created:
                    Updated: