• Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 5.2.0
    • None
    • None
    • None

      ORBRunner.java starts an orb using orb().run() but then performs operations on the orb after the run() method returns. According to the CORBA spec this is invalid:

      Once an ORB has shutdown, only object reference management operations(duplicate,
      release and is_nil) may be invoked on the ORB or any object reference obtained
      from it.

      Note that when the orb.run() method returns the orb has shutdown because, for the run method, the spec states:

      This operation will block until the ORB has completed the shutdown process,

      This issue has arisen because of a change made to our fork of the jdk orb: in the jdk orb shutdown method we join with all the orb runners. This results in deadlock:

      1. com.arjuna.orbportability.ORB.shutdown is a synchronized method and it calls shutdown on the jdk orb;
      2. shutdown on the jdk orb notifies the ORBRunner thread which now tries to call back into a synchronized method of com.arjuna.orbportability.ORB but is blocked because the monitor is held
      3. at this point the jdk orb shutdown would normally then return allowing the
        the ORBRunner thread to make progress but a recent change now means that the jdk orb shutdown method performs a join() on the various ORBRunner threads

            [JBTM-2423] ORBRunner uses the orb after run() returns

            I have raised JBTM-2514 to address the comment https://issues.jboss.org/browse/JBTM-2423?focusedCommentId=13109011&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13109011

            The reason for a new issue is that the underlying problem is different from this JIRA

            Michael Musgrove added a comment - I have raised JBTM-2514 to address the comment https://issues.jboss.org/browse/JBTM-2423?focusedCommentId=13109011&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13109011 The reason for a new issue is that the underlying problem is different from this JIRA

            Tomasz Adamski added a comment - - edited

            The method runs but on the first glance I don't see orb shutdown performed from it.

            Tomasz Adamski added a comment - - edited The method runs but on the first glance I don't see orb shutdown performed from it.

            tomekadamski During the wildfly subsystem reload operation does org.jboss.as.txn.service.ArjunaRecoveryManagerService.stop get called. If it isn't called then that is where the bug is and if it is called then we need to debug why it doesn't clean up correctly.

            Michael Musgrove added a comment - tomekadamski During the wildfly subsystem reload operation does org.jboss.as.txn.service.ArjunaRecoveryManagerService.stop get called. If it isn't called then that is where the bug is and if it is called then we need to debug why it doesn't clean up correctly.

            Tomasz Adamski added a comment - - edited

            The issue returned in WFLY-5261. I was looking at the problem today. The problem is that shutdown hooks are not executed - among them is JavaIdlRCShutdown - lack of is execution causes the error in WFLY-5261.

            In the changes above destruction of POA and shutdown of ORB were remove from ORBRunner class and:

            // Ensure destroy is called on the root OA so that any pre/post destroy hooks get called
            // Normally we expect whoever called shutdown to have done this however destroy is
            // safe to call multiple times
            OA.getRootOA(this).destroy();
            

            code was added to ORBs shutdown method. The problem is that noone calls shutdown method neither as it was executed only from ORBRunner class. On the other when we execute shutdown method from ORBRunner class we are violating specification which also leads to a problems that we were solving in the first place.

            My suggestion would be to try perform destruction/shutdown in ORBRunner but only of the wrappers (without delegation to underlying ORB/POAs as we are sure they are already destroyed when ORBRunner runs).

            I'm preparing quick test of that solution now so we can discuss it - will test it and paste it in few minutes if it passes the tests (and tomorrow if it doesn't ).

            Tomasz Adamski added a comment - - edited The issue returned in WFLY-5261 . I was looking at the problem today. The problem is that shutdown hooks are not executed - among them is JavaIdlRCShutdown - lack of is execution causes the error in WFLY-5261 . In the changes above destruction of POA and shutdown of ORB were remove from ORBRunner class and: // Ensure destroy is called on the root OA so that any pre/post destroy hooks get called // Normally we expect whoever called shutdown to have done this however destroy is // safe to call multiple times OA.getRootOA( this ).destroy(); code was added to ORBs shutdown method. The problem is that noone calls shutdown method neither as it was executed only from ORBRunner class. On the other when we execute shutdown method from ORBRunner class we are violating specification which also leads to a problems that we were solving in the first place. My suggestion would be to try perform destruction/shutdown in ORBRunner but only of the wrappers (without delegation to underlying ORB/POAs as we are sure they are already destroyed when ORBRunner runs). I'm preparing quick test of that solution now so we can discuss it - will test it and paste it in few minutes if it passes the tests (and tomorrow if it doesn't ).

            tomekadamski please will you add a comment explaining why you have reopened this JIRA

            Michael Musgrove added a comment - tomekadamski please will you add a comment explaining why you have reopened this JIRA

            The fix will be available in the next narayana 5.1.x release. If it needs back porting then we will need to create a 5.0.6 release.

            Michael Musgrove added a comment - The fix will be available in the next narayana 5.1.x release. If it needs back porting then we will need to create a 5.0.6 release.

            I added a line in orbportability/ORB.java to ensure that destroy is/has already been called on our root poa wrapper. This means that if whoever called shutdown on our ORB wrapper forgets to explicitly destroy it then we will do so guaranteeing that our pre/post poa destroy hooks get invoked.

            Michael Musgrove added a comment - I added a line in orbportability/ORB.java to ensure that destroy is/has already been called on our root poa wrapper. This means that if whoever called shutdown on our ORB wrapper forgets to explicitly destroy it then we will do so guaranteeing that our pre/post poa destroy hooks get invoked.

            Mark, you are correct. We are calling destroy on our own wrapper which does indeed do some extra work. I will revisit the fix, thanks.

            Michael Musgrove added a comment - Mark, you are correct. We are calling destroy on our own wrapper which does indeed do some extra work. I will revisit the fix, thanks.

            Since the run loop for JavaIdlRCServiceInit._orb has finished some code somewhere must have already called shutdown on it. The spec says that shutdown calls destroy on the root poa which in turn calls shutdown on its immediate descendants. So when the run loop finishes the code I have removed should effectively already have been called (ie both the shutdown on the orb and the destroy on the object adaptor).

            The relevant bits of the spec I refer to are 4.2.5.4 shutdown

            Additionally in systems that have Portable Object Adapters (see Chapter 11)
            shutdown behaves as if POA::destroy is called on the Root POA with its first
            parameter set to TRUE and the second parameter set to the value of the
            wait_for_completion parameter that shutdown is invoked with.

            and

            And POA::destroy will call destroy on all of its immediate descendants.

            Michael Musgrove added a comment - Since the run loop for JavaIdlRCServiceInit._orb has finished some code somewhere must have already called shutdown on it. The spec says that shutdown calls destroy on the root poa which in turn calls shutdown on its immediate descendants. So when the run loop finishes the code I have removed should effectively already have been called (ie both the shutdown on the orb and the destroy on the object adaptor). The relevant bits of the spec I refer to are 4.2.5.4 shutdown Additionally in systems that have Portable Object Adapters (see Chapter 11) shutdown behaves as if POA::destroy is called on the Root POA with its first parameter set to TRUE and the second parameter set to the value of the wait_for_completion parameter that shutdown is invoked with. and And POA::destroy will call destroy on all of its immediate descendants.

            In the change you've removed ...

            try

            • { - if (JavaIdlRCServiceInit._oa != null) - JavaIdlRCServiceInit._oa.destroy(); - - if (JavaIdlRCServiceInit._orb != null) - JavaIdlRCServiceInit._orb.shutdown(); - }
            • catch (Exception ex)
            • { - }

            Have you checked through the code paths to make sure we're not relying on OA destroy or ORB shutdown to do trigger something else?

            Mark Little added a comment - In the change you've removed ... try { - if (JavaIdlRCServiceInit._oa != null) - JavaIdlRCServiceInit._oa.destroy(); - - if (JavaIdlRCServiceInit._orb != null) - JavaIdlRCServiceInit._orb.shutdown(); - } catch (Exception ex) { - } Have you checked through the code paths to make sure we're not relying on OA destroy or ORB shutdown to do trigger something else?

              tadamski@redhat.com Tomasz Adamski
              rhn-engineering-mmusgrov Michael Musgrove
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: