-
Bug
-
Resolution: Not a Bug
-
Minor
-
None
-
rhel-7.9.z
-
None
-
No
-
Moderate
-
rhel-sst-high-availability
-
ssg_filesystems_storage_and_HA
-
1
-
False
-
-
None
-
None
-
None
-
None
-
None
What were you trying to do that didn't work?
I found very unpleasant situation with ha cluster involving dockerd and
container resources, in seemingly correct configuration, causing infinite
reboot of secondary node.
What is the impact of this issue to you?
Infinite reboot of secondary nodes, causing outages of resources being tryied
to be run and failing once the secondary node comes back.
How reproducible is this bug?
Always at the time, cannot easily reproduce again as the cluster is production.
Steps to reproduce:
This happened on production cluster, having RHEL7.9
- two node cluster with systemd dockerd resource and multiple container resources having with/after dependency on dockerd, everything running correctly
- both nodes forcefully restart
- one of the nodes comes up correctly becoming DC
- the other node not being able to start docker ending up in a endless loop because of the container resources failng to stop
- shutting off the secondary node, setting it to standby on DC hopping it will come back up as it "would not host any resources"
- powering it on to no changed of the sitution
- putting the secondary node to maintenance to allow it to start
- starting dockerd on secondary node
- un-maintenance secondary node, everything runs correctly
Expected result:
Container resource not failing stop action when there is no docker (socket or
remote). I know I can set this manually but if that is to be recommendation it
should be the default for container resource.
Actual results:
Infinite reboot of secondary nodes, causing outages of resources being tryied
to be run and failing once the secondary node comes back.
Other info:
My guess would be that this should be fixed in resource-agents but putting on pacemaker for the case that is to be solved differently.