Uploaded image for project: 'JBoss Transaction Manager'
  1. JBoss Transaction Manager
  2. JBTM-3830

TransactionReaper not shutting down

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 7.0.1.Final
    • 7.0.0.Final
    • JTA
    • None
    • Hide

      This is unfortunately not deterministic. Locally in approximately 10% of the test runs involving the FipsDistTests that are not supposed to successfully start would have hanging behavior. Since we don't see nearly this frequency otherwise (with tests that have full / normal startup), the exceptional startup of Keycloak/Quarkus is likely contributing.

      Show
      This is unfortunately not deterministic. Locally in approximately 10% of the test runs involving the FipsDistTests that are not supposed to successfully start would have hanging behavior. Since we don't see nearly this frequency otherwise (with tests that have full / normal startup), the exceptional startup of Keycloak/Quarkus is likely contributing.
    • Fixes TransactionReaper not shutting down when in runs in TransactionReaper.DYNAMIC mode which is the default as defined in the CoordinatorEnvironmentBean bean.

      See https://github.com/keycloak/keycloak/issues/23786 - in CI we see frequent test errors due to our CI due to the keycloak process staying alive. After some investigation we identified:

      "main" #1 prio=5 os_prio=0 cpu=13493.67ms elapsed=113.30s tid=0x00007f7b5c0130b0 nid=0xfa8b3 in Object.wait()  [0x00007f7b614a2000]
         java.lang.Thread.State: WAITING (on object monitor)
      	at java.lang.Object.wait(java.base@17.0.1/Native Method)
      	- waiting on <no object reference available>
      	at java.lang.Object.wait(java.base@17.0.1/Object.java:338)
      	at com.arjuna.ats.arjuna.coordinator.TransactionReaper.shutdown(TransactionReaper.java:760)
      	- locked <0x00000000e2f01498> (a com.arjuna.ats.arjuna.coordinator.TransactionReaper)
      	at com.arjuna.ats.arjuna.coordinator.TransactionReaper.terminate(TransactionReaper.java:969)
      	- locked <0x00000000e2eef7f0> (a java.lang.Class for com.arjuna.ats.arjuna.coordinator.TransactionReaper)
      	at io.quarkus.narayana.jta.runtime.NarayanaJtaRecorder.lambda$handleShutdown$3(NarayanaJtaRecorder.java:171)
      	at io.quarkus.narayana.jta.runtime.NarayanaJtaRecorder$$Lambda$1058/0x000000080152c900.run(Unknown Source)
      	at io.quarkus.runtime.StartupContext.runAllInReverseOrder(StartupContext.java:84)
      	at io.quarkus.runtime.StartupContext.close(StartupContext.java:75)
      	at io.quarkus.runner.ApplicationImpl.doStart(Unknown Source)
      	at io.quarkus.runtime.Application.start(Application.java:101)
      	at io.quarkus.runtime.ApplicationLifecycleManager.run(ApplicationLifecycleManager.java:111)
      	at io.quarkus.runtime.Quarkus.run(Quarkus.java:71)
      	at org.keycloak.quarkus.runtime.KeycloakMain.start(KeycloakMain.java:117)
      	at org.keycloak.quarkus.runtime.cli.command.AbstractStartCommand.run(AbstractStartCommand.java:33)
      	at picocli.CommandLine.executeUserObject(CommandLine.java:2026)
      	at picocli.CommandLine.access$1500(CommandLine.java:148)
      	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
      	at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
      	at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
      	at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2273)
      	at picocli.CommandLine$RunLast.execute(CommandLine.java:2417)
      	at picocli.CommandLine.execute(CommandLine.java:2170)
      	at org.keycloak.quarkus.runtime.cli.Picocli.parseAndRun(Picocli.java:124)
      	at org.keycloak.quarkus.runtime.KeycloakMain.main(KeycloakMain.java:107)
      	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@17.0.1/Native Method)
      	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@17.0.1/NativeMethodAccessorImpl.java:77)
      	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@17.0.1/DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(java.base@17.0.1/Method.java:568)
      	at io.quarkus.bootstrap.runner.QuarkusEntryPoint.doRun(QuarkusEntryPoint.java:61)
      	at io.quarkus.bootstrap.runner.QuarkusEntryPoint.main(QuarkusEntryPoint.java:32)
      

      In reviewing the narayana code, there's a potential race condition - while the shutdown method is locked, the removeElementClient method is not and _inShutdown is not marked as volatile, but is accessed as if it is.

      It's possible for removeElementClient [1], to see _inShutdown as false even if the main thread has has entered the shutdown method - that would then cause the main method to not be notified and wait indefinitely.

      [1] https://github.com/jbosstm/narayana/blob/05f4a5c1069e7971b6f707cdf806a0bb7c6387c3/ArjunaCore/arjuna/classes/com/arjuna/ats/arjuna/coordinator/TransactionReaper.java#L779

              rhn-engineering-shawkins Steven Hawkins
              rhn-engineering-shawkins Steven Hawkins
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: