Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-24731

Batch job fails to restart on server resume after server suspend

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Major Major
    • None
    • None
    • Batch, EJB, Security
    • None
    • False
    • None
    • False
    • +

      It looks like there may be an issue with the Batch jobs being restarted when server is suspended and then resumed again. This happens when this attribute is set:

      /subsystem=batch-jberet:write-attribute(name=restart-jobs-on-resume,value=true)
      

      Following error can be found in the log:

      09:07:30,017 ERROR [org.wildfly.extension.batch] (management-handler-thread - 1) WFLYBATCH000016: Failed to restart execution 1 for job records-batchlet on deployment batch-suspend.jar: org.wildfly.security.authz.AuthorizationFailureException: ELY01088: Attempting to run as "$local" authorization operation failed
      	at org.wildfly.security.elytron-base@2.1.0.Final//org.wildfly.security.auth.server.SecurityIdentity.createRunAsIdentity(SecurityIdentity.java:750)
      	at org.wildfly.security.elytron-base@2.1.0.Final//org.wildfly.security.auth.server.SecurityIdentity.createRunAsIdentity(SecurityIdentity.java:725)
      	at org.wildfly.extension.batch.jberet@28.0.0.Beta1//org.wildfly.extension.batch.jberet.deployment.JobOperatorService$BatchJobServerActivity.privilegedRunAs(JobOperatorService.java:568)
      	at org.wildfly.extension.batch.jberet@28.0.0.Beta1//org.wildfly.extension.batch.jberet.deployment.JobOperatorService$BatchJobServerActivity.restartStoppedJobs(JobOperatorService.java:543)
      	at org.wildfly.extension.batch.jberet@28.0.0.Beta1//org.wildfly.extension.batch.jberet.deployment.JobOperatorService$BatchJobServerActivity.resume(JobOperatorService.java:458)
      	at org.jboss.as.server@20.0.0.Beta8//org.jboss.as.server.suspend.SuspendController.resume(SuspendController.java:128)
      	at org.jboss.as.server@20.0.0.Beta8//org.jboss.as.server.suspend.SuspendController.resume(SuspendController.java:106)
      	at org.jboss.as.server@20.0.0.Beta8//org.jboss.as.server.operations.ServerResumeHandler$1$1.handleResult(ServerResumeHandler.java:74)
      	at org.jboss.as.controller@20.0.0.Beta8//org.jboss.as.controller.AbstractOperationContext$Step.invokeResultHandler(AbstractOperationContext.java:1570)
      	at org.jboss.as.controller@20.0.0.Beta8//org.jboss.as.controller.AbstractOperationContext$Step.handleResult(AbstractOperationContext.java:1552)
      	at org.jboss.as.controller@20.0.0.Beta8//org.jboss.as.controller.AbstractOperationContext$Step.finalizeInternal(AbstractOperationContext.java:1509)
      	at org.jboss.as.controller@20.0.0.Beta8//org.jboss.as.controller.AbstractOperationContext$Step.finalizeStep(AbstractOperationContext.java:1482)
      	at org.jboss.as.controller@20.0.0.Beta8//org.jboss.as.controller.AbstractOperationContext.executeResultHandlerPhase(AbstractOperationContext.java:910)
      	at org.jboss.as.controller@20.0.0.Beta8//org.jboss.as.controller.AbstractOperationContext.executeDoneStage(AbstractOperationContext.java:896)
      	at org.jboss.as.controller@20.0.0.Beta8//org.jboss.as.controller.AbstractOperationContext.processStages(AbstractOperationContext.java:803)
      	at org.jboss.as.controller@20.0.0.Beta8//org.jboss.as.controller.AbstractOperationContext.executeOperation(AbstractOperationContext.java:466)
      	at org.jboss.as.controller@20.0.0.Beta8//org.jboss.as.controller.OperationContextImpl.executeOperation(OperationContextImpl.java:1431)
      	at org.jboss.as.controller@20.0.0.Beta8//org.jboss.as.controller.ModelControllerImpl.internalExecute(ModelControllerImpl.java:448)
      	at org.jboss.as.controller@20.0.0.Beta8//org.jboss.as.controller.ModelControllerImpl.lambda$executeForResponse$0(ModelControllerImpl.java:259)
      	at org.wildfly.security.elytron-base@2.1.0.Final//org.wildfly.security.auth.server.SecurityIdentity.runAs(SecurityIdentity.java:304)
      	at org.wildfly.security.elytron-base@2.1.0.Final//org.wildfly.security.auth.server.SecurityIdentity.runAs(SecurityIdentity.java:270)
      	at org.jboss.as.controller@20.0.0.Beta8//org.jboss.as.controller.ModelControllerImpl.executeForResponse(ModelControllerImpl.java:259)
      	at org.jboss.as.controller@20.0.0.Beta8//org.jboss.as.controller.ModelControllerImpl.executeOperation(ModelControllerImpl.java:253)
      	at org.jboss.as.controller@20.0.0.Beta8//org.jboss.as.controller.ModelControllerImpl.execute(ModelControllerImpl.java:236)
      	at org.jboss.as.controller@20.0.0.Beta8//org.jboss.as.controller.remote.ModelControllerClientOperationHandler$ExecuteRequestHandler.doExecute(ModelControllerClientOperationHandler.java:241)
      	at org.jboss.as.controller@20.0.0.Beta8//org.jboss.as.controller.remote.ModelControllerClientOperationHandler$ExecuteRequestHandler$1$1.run(ModelControllerClientOperationHandler.java:163)
      	at org.jboss.as.controller@20.0.0.Beta8//org.jboss.as.controller.remote.ModelControllerClientOperationHandler$ExecuteRequestHandler$1$1.run(ModelControllerClientOperationHandler.java:159)
      	at org.wildfly.security.elytron-base@2.1.0.Final//org.wildfly.security.auth.server.SecurityIdentity.runAs(SecurityIdentity.java:328)
      	at org.wildfly.security.elytron-base@2.1.0.Final//org.wildfly.security.auth.server.SecurityIdentity.runAs(SecurityIdentity.java:285)
      	at org.jboss.as.controller@20.0.0.Beta8//org.jboss.as.controller.AccessAuditContext.doAs(AccessAuditContext.java:254)
      	at org.jboss.as.controller@20.0.0.Beta8//org.jboss.as.controller.AccessAuditContext.doAs(AccessAuditContext.java:225)
      	at org.jboss.as.controller@20.0.0.Beta8//org.jboss.as.controller.remote.ModelControllerClientOperationHandler$ExecuteRequestHandler$1.execute(ModelControllerClientOperationHandler.java:159)
      	at org.jboss.as.protocol@20.0.0.Beta8//org.jboss.as.protocol.mgmt.ManagementRequestContextImpl$1.doExecute(ManagementRequestContextImpl.java:70)
      	at org.jboss.as.protocol@20.0.0.Beta8//org.jboss.as.protocol.mgmt.ManagementRequestContextImpl$AsyncTaskRunner.run(ManagementRequestContextImpl.java:160)
      	at org.jboss.threads@2.4.0.Final//org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
      	at org.jboss.threads@2.4.0.Final//org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1990)
      	at org.jboss.threads@2.4.0.Final//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
      	at org.jboss.threads@2.4.0.Final//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1377)
      	at java.base/java.lang.Thread.run(Thread.java:829)
      	at org.jboss.threads@2.4.0.Final//org.jboss.threads.JBossThread.run(JBossThread.java:513)
      
      2023-04-06 09:07:30.019 INFO  o.j.q.s.s.StandaloneServerManager: Waiting for server to be running
      2023-04-06 09:07:30.519 INFO  o.w.e.c.c.o.OnlineManagementClient: Reconnecting the client
      2023-04-06 09:07:30.578 INFO  org.jboss.as.cli.CommandContext: Warning! The CLI is running in a non-modular environment and cannot load commands from management extensions.
      2023-04-06 09:07:30.586 DEBUG o.w.e.c.c.o.OnlineManagementClient: Executing operation /:read-children-types
      2023-04-06 09:07:30.589 DEBUG o.w.e.c.c.o.OnlineManagementClient: Executing operation /:read-attribute(name=server-state)
      2023-04-06 09:07:30.591 DEBUG o.w.e.c.c.o.OnlineManagementClient: Executing operation /:read-attribute(name=server-state)
      2023-04-06 09:07:30.593 DEBUG o.w.e.c.c.o.OnlineManagementClient: Executing operation /:read-attribute(name=suspend-state)
      2023-04-06 09:07:30.595 INFO  o.j.q.s.s.StandaloneServerManager: Current suspend state is: RUNNING
      

      Based on the error, there is some issue with authorization but the Batch job was successfully started before server was suspended an resumed. Is it possible that some auth token or context was thrown away in the meantime?

      Seems that this commit is the culprit (WFLY-16863), relevant PR.

      I was also able to identify that this commit (WFLY-17156) is the culprit - before this commit it works as expected, after it (tested with this and WildFly 28.0.0.Beta1) aforementioned error is produced. TBH, based on the changes in that commit I don't really know how it may affect this behavior. I was thinking about something in the deployment processor changes there but I don't really know...

      This issue is already present in the released wildfly-28.0.0.Beta1.zip.

              tadamski@redhat.com Tomasz Adamski
              jstourac@redhat.com Jan Stourac
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: