kwills@redhat.com, given the statement in the description "Otherwise the process will just get removed, however started with the next :reload of the HC", I think the original "bug" scenario was this:
1) Start an HC "foo" with a server "bar" that has auto-start=true in its server-config
2) /host=foo/server=bar:stop
3) reload --host=foo --restart-servers=false
When step 3 is done, server "bar" would still be stopped. Because the --restart-servers param is set to false, the user is indicating the reload is not meant to change what servers are running. So, the auto-start=true should not take precedence over the fact the server was stopped.
Arguably the same could be said with --restart-servers=true as well, i.e. that a reload is not the same thing as a new boot.
I think the thing to do here is to have a solution that fits best with the WFCORE-297 fix, but adds in clear semantics for the --restart-servers=false case. Double check with ehugonne1@redhat.com but I believe what was done with 297 was adding that update-auto-start-with-server-status = true to cause the HC to "remember" whether a server is stopped/started and then use that when it restarts. And if that attribute isn't true, then the HC doesn't "remember" and all that matters is what's in host.xml. My instinct is the right thing to do is to be consistent when doing a reload as well. A reload is really just a faster restart that preserves the same process. It's behavior should be as close to restart as possible.
It's quite likely that WFCORE-297 has fixed this as well.
BTW, regardless of these attributes, in the "reload --host=foo --restart-servers=false" case, if a server was running before the reload, the HC needs to properly trigger a reconnect. We can never have a situation where after a reload a disconnected server is left floating unmanaged as a zombie.
Actually, I think I may have been able to figure out how to cause this (well, I mean "this" as in there are probably other states like this):
(1) Start master & slave HCs
(2) kill master HC
(3) connect to slave HC and start a non-auto-start server
(4) restart master HC
(5) server shows up in master as disabled, but is running.
edit, never mind, now it seems to be working, was just the registration retry delay I think.