Uploaded image for project: 'WildFly Core'
  1. WildFly Core
  2. WFCORE-273

ServerInventory#reconnectServer should take auto-start into account

    • Icon: Task Task
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • Management

      When reconnecting to stopped (but not removed) servers, the ServerInventory should take the auto-start property into account. Otherwise the process will just get removed, however started with the next :reload of the HC.

            [WFCORE-273] ServerInventory#reconnectServer should take auto-start into account

            Ken Wills added a comment - - edited

            Actually, I think I may have been able to figure out how to cause this (well, I mean "this" as in there are probably other states like this):

            (1) Start master & slave HCs
            (2) kill master HC
            (3) connect to slave HC and start a non-auto-start server
            (4) restart master HC
            (5) server shows up in master as disabled, but is running.

            edit, never mind, now it seems to be working, was just the registration retry delay I think.

            Ken Wills added a comment - - edited Actually, I think I may have been able to figure out how to cause this (well, I mean "this" as in there are probably other states like this): (1) Start master & slave HCs (2) kill master HC (3) connect to slave HC and start a non-auto-start server (4) restart master HC (5) server shows up in master as disabled, but is running. edit , never mind, now it seems to be working, was just the registration retry delay I think.

            Ken Wills added a comment - - edited

            Here are the oucomes with --restart-servers

              auto-start reload --restart-servers= Current Status Outcome Existing Config / add with CLI Comments
            1 default (true) not specified running stopped & started (existing config)
            2 default (true) not specified not running started (existing config)
            3 default (true) false not running started (existing config) Should this be started with --restart-servers=false?
            4 default (true) true running stopped & started (existing config)
            5 false not specified running stopped & not started (existing config)
            6 false true running stopped & not started (existing config)
            7 false false running not stopped & still running (existing config)
            8 true false not running started (added new server) Should this be started with --restart-servers=false? (same as 3)
            9 false false not running not started (added new server)
            10 false false running running (added new server & started manually)
            11 false true running stopped & not started (added new server & started manually)

            So I think we'd want to think about 3 and 11

            (3) If an auto-start server has been stopped, and restart-servers=false, should it restart the stopped server? (I think no.)
            (11) for a new auto-start=false server, restart-servers=true doesn't start the server if it is currently running (it is stopped though). This might be ok though, as it was explicitly added with auto-start = false. (Edit, never mind 11, this behavior makes sense.)

            Ken Wills added a comment - - edited Here are the oucomes with --restart-servers   auto-start reload --restart-servers= Current Status Outcome Existing Config / add with CLI Comments 1 default (true) not specified running stopped & started (existing config) 2 default (true) not specified not running started (existing config) 3 default (true) false not running started (existing config) Should this be started with --restart-servers=false? 4 default (true) true running stopped & started (existing config) 5 false not specified running stopped & not started (existing config) 6 false true running stopped & not started (existing config) 7 false false running not stopped & still running (existing config) 8 true false not running started (added new server) Should this be started with --restart-servers=false? (same as 3) 9 false false not running not started (added new server) 10 false false running running (added new server & started manually) 11 false true running stopped & not started (added new server & started manually) So I think we'd want to think about 3 and 11 (3) If an auto-start server has been stopped, and restart-servers=false, should it restart the stopped server? (I think no.) (11) for a new auto-start=false server, restart-servers=true doesn't start the server if it is currently running (it is stopped though). This might be ok though, as it was explicitly added with auto-start = false. (Edit, never mind 11, this behavior makes sense.)

            Making a list sounds good. I won't opine on "auto-start=true and not running" until I see the list, otherwise the list will just point out all the contradictions in whatever I say.

            One other important factor in our decisions is "what do we do now?" If something isn't clearly broken, not changing behavior has a lot of merit.

            Brian Stansberry added a comment - Making a list sounds good. I won't opine on "auto-start=true and not running" until I see the list, otherwise the list will just point out all the contradictions in whatever I say. One other important factor in our decisions is "what do we do now?" If something isn't clearly broken, not changing behavior has a lot of merit.

            Ken Wills added a comment -

            Adding a server with auto-start=false then starting it, after an HC reload with --restart-servers=true, doesn't get started. I think these should get restarted if running.

            What about the reverse case, if auto-start=true and not running, should it be started with --restart-servers? (My initial reaction is no.) This seems to be overlapping into update-auto-start-with-server-status = true.

            I'll make a list of the states & outcomes with auto-start, --restart=true|false and update-auto-start-with-server-status and review.

            Ken Wills added a comment - Adding a server with auto-start=false then starting it, after an HC reload with --restart-servers=true, doesn't get started. I think these should get restarted if running. What about the reverse case, if auto-start=true and not running, should it be started with --restart-servers? (My initial reaction is no.) This seems to be overlapping into update-auto-start-with-server-status = true. I'll make a list of the states & outcomes with auto-start, --restart=true|false and update-auto-start-with-server-status and review.

            Ken Wills added a comment -

            bstansberry Thanks for the info. I'd checked the zombie cases on reconnect, but not reload with --restart-servers=true, I've reopened to check those, then I'll check in with ehugonne1@redhat.com

            Ken Wills added a comment - bstansberry Thanks for the info. I'd checked the zombie cases on reconnect, but not reload with --restart-servers=true, I've reopened to check those, then I'll check in with ehugonne1@redhat.com

            kwills@redhat.com, given the statement in the description "Otherwise the process will just get removed, however started with the next :reload of the HC", I think the original "bug" scenario was this:

            1) Start an HC "foo" with a server "bar" that has auto-start=true in its server-config
            2) /host=foo/server=bar:stop
            3) reload --host=foo --restart-servers=false

            When step 3 is done, server "bar" would still be stopped. Because the --restart-servers param is set to false, the user is indicating the reload is not meant to change what servers are running. So, the auto-start=true should not take precedence over the fact the server was stopped.

            Arguably the same could be said with --restart-servers=true as well, i.e. that a reload is not the same thing as a new boot.

            I think the thing to do here is to have a solution that fits best with the WFCORE-297 fix, but adds in clear semantics for the --restart-servers=false case. Double check with ehugonne1@redhat.com but I believe what was done with 297 was adding that update-auto-start-with-server-status = true to cause the HC to "remember" whether a server is stopped/started and then use that when it restarts. And if that attribute isn't true, then the HC doesn't "remember" and all that matters is what's in host.xml. My instinct is the right thing to do is to be consistent when doing a reload as well. A reload is really just a faster restart that preserves the same process. It's behavior should be as close to restart as possible.

            It's quite likely that WFCORE-297 has fixed this as well.

            BTW, regardless of these attributes, in the "reload --host=foo --restart-servers=false" case, if a server was running before the reload, the HC needs to properly trigger a reconnect. We can never have a situation where after a reload a disconnected server is left floating unmanaged as a zombie.

            Brian Stansberry added a comment - kwills@redhat.com , given the statement in the description "Otherwise the process will just get removed, however started with the next :reload of the HC", I think the original "bug" scenario was this: 1) Start an HC "foo" with a server "bar" that has auto-start=true in its server-config 2) /host=foo/server=bar:stop 3) reload --host=foo --restart-servers=false When step 3 is done, server "bar" would still be stopped. Because the --restart-servers param is set to false, the user is indicating the reload is not meant to change what servers are running. So, the auto-start=true should not take precedence over the fact the server was stopped. Arguably the same could be said with --restart-servers=true as well, i.e. that a reload is not the same thing as a new boot. I think the thing to do here is to have a solution that fits best with the WFCORE-297 fix, but adds in clear semantics for the --restart-servers=false case. Double check with ehugonne1@redhat.com but I believe what was done with 297 was adding that update-auto-start-with-server-status = true to cause the HC to "remember" whether a server is stopped/started and then use that when it restarts. And if that attribute isn't true, then the HC doesn't "remember" and all that matters is what's in host.xml. My instinct is the right thing to do is to be consistent when doing a reload as well. A reload is really just a faster restart that preserves the same process. It's behavior should be as close to restart as possible. It's quite likely that WFCORE-297 has fixed this as well. BTW, regardless of these attributes, in the "reload --host=foo --restart-servers=false" case, if a server was running before the reload, the HC needs to properly trigger a reconnect. We can never have a situation where after a reload a disconnected server is left floating unmanaged as a zombie.

            Ken Wills added a comment -

            After doing quite a bit of testing on this, the bug has either been fixed by one of the other changes in this area, or I'm just not seeing it. A server marked with auto-start="false" that is started, is shown stopped and shown as disabled after HC reload and disconnect / reconnect of the HC and various other combinations, both on the master HC and slaves.

            It appears the attribute update-auto-start-with-server-status was added to track the running states across reloads, and with this enabled the last running state is reflected correctly (auto-start=false / update-auto-start-with-server-status = true and server = RUNNING, after a HC reload the server is correctly restarted).

            bstansberry Do you happen to remember anything else about this bug?

            Ken Wills added a comment - After doing quite a bit of testing on this, the bug has either been fixed by one of the other changes in this area, or I'm just not seeing it. A server marked with auto-start="false" that is started, is shown stopped and shown as disabled after HC reload and disconnect / reconnect of the HC and various other combinations, both on the master HC and slaves. It appears the attribute update-auto-start-with-server-status was added to track the running states across reloads, and with this enabled the last running state is reflected correctly (auto-start=false / update-auto-start-with-server-status = true and server = RUNNING, after a HC reload the server is correctly restarted). bstansberry Do you happen to remember anything else about this bug?

            Scheduling for 8.0.1 as this is a bug fix. But moving to 9 is ok.

            Brian Stansberry added a comment - Scheduling for 8.0.1 as this is a bug fix. But moving to 9 is ok.

              kwills@redhat.com Ken Wills
              emuckenhuber_jira Emanuel Muckenhuber (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: