[WFCORE-273] ServerInventory#reconnectServer should take auto-start into account

Type: Task
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: Management
Labels:
- domain-mode

When reconnecting to stopped (but not removed) servers, the ServerInventory should take the auto-start property into account. Otherwise the process will just get removed, however started with the next :reload of the HC.

is related to

AS7-5662 Cannot start a managed server anymore if it's failed and the host controller is reloaded/restarted

Resolved

Ken Wills added a comment - 2016/03/28 5:11 PM - edited

Actually, I think I may have been able to figure out how to cause this (well, I mean "this" as in there are probably other states like this):

(1) Start master & slave HCs
(2) kill master HC
(3) connect to slave HC and start a non-auto-start server
(4) restart master HC
(5) server shows up in master as disabled, but is running.

edit, never mind, now it seems to be working, was just the registration retry delay I think.

Ken Wills added a comment - 2016/03/28 5:11 PM - edited Actually, I think I may have been able to figure out how to cause this (well, I mean "this" as in there are probably other states like this): (1) Start master & slave HCs (2) kill master HC (3) connect to slave HC and start a non-auto-start server (4) restart master HC (5) server shows up in master as disabled, but is running. edit , never mind, now it seems to be working, was just the registration retry delay I think.

Ken Wills added a comment - 2016/03/28 4:59 PM - edited

Here are the oucomes with --restart-servers

	auto-start	reload --restart-servers=	Current Status	Outcome	Existing Config / add with CLI	Comments
1	default (true)	not specified	running	stopped & started	(existing config)
2	default (true)	not specified	not running	started	(existing config)
3	default (true)	false	not running	started	(existing config)	Should this be started with --restart-servers=false?
4	default (true)	true	running	stopped & started	(existing config)
5	false	not specified	running	stopped & not started	(existing config)
6	false	true	running	stopped & not started	(existing config)
7	false	false	running	not stopped & still running	(existing config)
8	true	false	not running	started	(added new server)	Should this be started with --restart-servers=false? (same as 3)
9	false	false	not running	not started	(added new server)
10	false	false	running	running	(added new server & started manually)
11	false	true	running	stopped & not started	(added new server & started manually)

So I think we'd want to think about 3 and 11

(3) If an auto-start server has been stopped, and restart-servers=false, should it restart the stopped server? (I think no.)
(11) for a new auto-start=false server, restart-servers=true doesn't start the server if it is currently running (it is stopped though). This might be ok though, as it was explicitly added with auto-start = false. (Edit, never mind 11, this behavior makes sense.)

Ken Wills added a comment - 2016/03/28 4:59 PM - edited Here are the oucomes with --restart-servers auto-start reload --restart-servers= Current Status Outcome Existing Config / add with CLI Comments 1 default (true) not specified running stopped & started (existing config) 2 default (true) not specified not running started (existing config) 3 default (true) false not running started (existing config) Should this be started with --restart-servers=false? 4 default (true) true running stopped & started (existing config) 5 false not specified running stopped & not started (existing config) 6 false true running stopped & not started (existing config) 7 false false running not stopped & still running (existing config) 8 true false not running started (added new server) Should this be started with --restart-servers=false? (same as 3) 9 false false not running not started (added new server) 10 false false running running (added new server & started manually) 11 false true running stopped & not started (added new server & started manually) So I think we'd want to think about 3 and 11 (3) If an auto-start server has been stopped, and restart-servers=false, should it restart the stopped server? (I think no.) (11) for a new auto-start=false server, restart-servers=true doesn't start the server if it is currently running (it is stopped though). This might be ok though, as it was explicitly added with auto-start = false. (Edit, never mind 11, this behavior makes sense.)

Brian Stansberry added a comment - 2016/03/28 4:25 PM

Making a list sounds good. I won't opine on "auto-start=true and not running" until I see the list, otherwise the list will just point out all the contradictions in whatever I say.

One other important factor in our decisions is "what do we do now?" If something isn't clearly broken, not changing behavior has a lot of merit.

Brian Stansberry added a comment - 2016/03/28 4:25 PM Making a list sounds good. I won't opine on "auto-start=true and not running" until I see the list, otherwise the list will just point out all the contradictions in whatever I say. One other important factor in our decisions is "what do we do now?" If something isn't clearly broken, not changing behavior has a lot of merit.

Ken Wills added a comment - 2016/03/28 4:09 PM

Adding a server with auto-start=false then starting it, after an HC reload with --restart-servers=true, doesn't get started. I think these should get restarted if running.

What about the reverse case, if auto-start=true and not running, should it be started with --restart-servers? (My initial reaction is no.) This seems to be overlapping into update-auto-start-with-server-status = true.

I'll make a list of the states & outcomes with auto-start, --restart=true|false and update-auto-start-with-server-status and review.

Ken Wills added a comment - 2016/03/28 4:09 PM Adding a server with auto-start=false then starting it, after an HC reload with --restart-servers=true, doesn't get started. I think these should get restarted if running. What about the reverse case, if auto-start=true and not running, should it be started with --restart-servers? (My initial reaction is no.) This seems to be overlapping into update-auto-start-with-server-status = true. I'll make a list of the states & outcomes with auto-start, --restart=true|false and update-auto-start-with-server-status and review.

Ken Wills added a comment - 2016/03/28 3:40 PM

bstansberry Thanks for the info. I'd checked the zombie cases on reconnect, but not reload with --restart-servers=true, I've reopened to check those, then I'll check in with ehugonne1@redhat.com

Ken Wills added a comment - 2016/03/28 3:40 PM bstansberry Thanks for the info. I'd checked the zombie cases on reconnect, but not reload with --restart-servers=true, I've reopened to check those, then I'll check in with ehugonne1@redhat.com

Brian Stansberry added a comment - 2016/03/28 3:34 PM

kwills@redhat.com, given the statement in the description "Otherwise the process will just get removed, however started with the next :reload of the HC", I think the original "bug" scenario was this:

1) Start an HC "foo" with a server "bar" that has auto-start=true in its server-config
2) /host=foo/server=bar:stop
3) reload --host=foo --restart-servers=false

When step 3 is done, server "bar" would still be stopped. Because the --restart-servers param is set to false, the user is indicating the reload is not meant to change what servers are running. So, the auto-start=true should not take precedence over the fact the server was stopped.

Arguably the same could be said with --restart-servers=true as well, i.e. that a reload is not the same thing as a new boot.

I think the thing to do here is to have a solution that fits best with the ~~WFCORE-297~~ fix, but adds in clear semantics for the --restart-servers=false case. Double check with ehugonne1@redhat.com but I believe what was done with 297 was adding that update-auto-start-with-server-status = true to cause the HC to "remember" whether a server is stopped/started and then use that when it restarts. And if that attribute isn't true, then the HC doesn't "remember" and all that matters is what's in host.xml. My instinct is the right thing to do is to be consistent when doing a reload as well. A reload is really just a faster restart that preserves the same process. It's behavior should be as close to restart as possible.

It's quite likely that ~~WFCORE-297~~ has fixed this as well.

BTW, regardless of these attributes, in the "reload --host=foo --restart-servers=false" case, if a server was running before the reload, the HC needs to properly trigger a reconnect. We can never have a situation where after a reload a disconnected server is left floating unmanaged as a zombie.

Brian Stansberry added a comment - 2016/03/28 3:34 PM kwills@redhat.com , given the statement in the description "Otherwise the process will just get removed, however started with the next :reload of the HC", I think the original "bug" scenario was this: 1) Start an HC "foo" with a server "bar" that has auto-start=true in its server-config 2) /host=foo/server=bar:stop 3) reload --host=foo --restart-servers=false When step 3 is done, server "bar" would still be stopped. Because the --restart-servers param is set to false, the user is indicating the reload is not meant to change what servers are running. So, the auto-start=true should not take precedence over the fact the server was stopped. Arguably the same could be said with --restart-servers=true as well, i.e. that a reload is not the same thing as a new boot. I think the thing to do here is to have a solution that fits best with the WFCORE-297 fix, but adds in clear semantics for the --restart-servers=false case. Double check with ehugonne1@redhat.com but I believe what was done with 297 was adding that update-auto-start-with-server-status = true to cause the HC to "remember" whether a server is stopped/started and then use that when it restarts. And if that attribute isn't true, then the HC doesn't "remember" and all that matters is what's in host.xml. My instinct is the right thing to do is to be consistent when doing a reload as well. A reload is really just a faster restart that preserves the same process. It's behavior should be as close to restart as possible. It's quite likely that WFCORE-297 has fixed this as well. BTW, regardless of these attributes, in the "reload --host=foo --restart-servers=false" case, if a server was running before the reload, the HC needs to properly trigger a reconnect. We can never have a situation where after a reload a disconnected server is left floating unmanaged as a zombie.

Ken Wills added a comment - 2016/03/24 6:11 PM

After doing quite a bit of testing on this, the bug has either been fixed by one of the other changes in this area, or I'm just not seeing it. A server marked with auto-start="false" that is started, is shown stopped and shown as disabled after HC reload and disconnect / reconnect of the HC and various other combinations, both on the master HC and slaves.

It appears the attribute update-auto-start-with-server-status was added to track the running states across reloads, and with this enabled the last running state is reflected correctly (auto-start=false / update-auto-start-with-server-status = true and server = RUNNING, after a HC reload the server is correctly restarted).

bstansberry Do you happen to remember anything else about this bug?

Ken Wills added a comment - 2016/03/24 6:11 PM After doing quite a bit of testing on this, the bug has either been fixed by one of the other changes in this area, or I'm just not seeing it. A server marked with auto-start="false" that is started, is shown stopped and shown as disabled after HC reload and disconnect / reconnect of the HC and various other combinations, both on the master HC and slaves. It appears the attribute update-auto-start-with-server-status was added to track the running states across reloads, and with this enabled the last running state is reflected correctly (auto-start=false / update-auto-start-with-server-status = true and server = RUNNING, after a HC reload the server is correctly restarted). bstansberry Do you happen to remember anything else about this bug?

Brian Stansberry added a comment - 2014/02/07 9:30 PM

Scheduling for 8.0.1 as this is a bug fix. But moving to 9 is ok.

Brian Stansberry added a comment - 2014/02/07 9:30 PM Scheduling for 8.0.1 as this is a bug fix. But moving to 9 is ok.

Assignee:: Ken Wills

Reporter:: Emanuel Muckenhuber (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2012/10/17 11:24 AM

Updated:: 2021/10/24 5:48 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: Ken Wills added a comment - 2016/03/28 5:11 PM, Edited by Ken Wills - 2016/03/28 6:24 PM

Expand comment: Ken Wills added a comment - 2016/03/28 5:11 PM, Edited by Ken Wills - 2016/03/28 6:24 PM

Collapse comment: Ken Wills added a comment - 2016/03/28 4:59 PM, Edited by Ken Wills - 2016/03/28 6:25 PM

Expand comment: Ken Wills added a comment - 2016/03/28 4:59 PM, Edited by Ken Wills - 2016/03/28 6:25 PM

Collapse comment: Brian Stansberry added a comment - 2016/03/28 4:25 PM

Expand comment: Brian Stansberry added a comment - 2016/03/28 4:25 PM

Collapse comment: Ken Wills added a comment - 2016/03/28 4:09 PM

Expand comment: Ken Wills added a comment - 2016/03/28 4:09 PM

Collapse comment: Ken Wills added a comment - 2016/03/28 3:40 PM

Expand comment: Ken Wills added a comment - 2016/03/28 3:40 PM

Collapse comment: Brian Stansberry added a comment - 2016/03/28 3:34 PM

Expand comment: Brian Stansberry added a comment - 2016/03/28 3:34 PM

Collapse comment: Ken Wills added a comment - 2016/03/24 6:11 PM

Expand comment: Ken Wills added a comment - 2016/03/24 6:11 PM

Collapse comment: Brian Stansberry added a comment - 2014/02/07 9:30 PM

Expand comment: Brian Stansberry added a comment - 2014/02/07 9:30 PM

People

Dates