[RHEL-63171] Docker resource agent always fails to stop when there is no docker running - Red Hat Issue Tracker

Type: Bug
Resolution: Not a Bug
Priority: Minor
Fix Version/s: None
Affects Version/s: rhel-7.9.z
Component/s: pacemaker
Labels:
None

Regression:
No
Severity:
Moderate

Pool Team:

rhel-sst-high-availability
Sub-System Group:

ssg_filesystems_storage_and_HA

Story Points:
1
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Sprint:
None

Preliminary Testing:
None
Test Coverage:
None

Experience:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

What were you trying to do that didn't work?
I found very unpleasant situation with ha cluster involving dockerd and
container resources, in seemingly correct configuration, causing infinite
reboot of secondary node.

What is the impact of this issue to you?
Infinite reboot of secondary nodes, causing outages of resources being tryied
to be run and failing once the secondary node comes back.

How reproducible is this bug?

Always at the time, cannot easily reproduce again as the cluster is production.

Steps to reproduce:

This happened on production cluster, having RHEL7.9

two node cluster with systemd dockerd resource and multiple container resources having with/after dependency on dockerd, everything running correctly
both nodes forcefully restart
one of the nodes comes up correctly becoming DC
the other node not being able to start docker ending up in a endless loop because of the container resources failng to stop
shutting off the secondary node, setting it to standby on DC hopping it will come back up as it "would not host any resources"
powering it on to no changed of the sitution
putting the secondary node to maintenance to allow it to start
starting dockerd on secondary node
un-maintenance secondary node, everything runs correctly

Expected result:
Container resource not failing stop action when there is no docker (socket or
remote). I know I can set this manually but if that is to be recommendation it
should be the default for container resource.

Actual results:
Infinite reboot of secondary nodes, causing outages of resources being tryied
to be run and failing once the secondary node comes back.

Other info:

My guess would be that this should be fixed in resource-agents but putting on pacemaker for the case that is to be solved differently.

Assignee:: Kenneth Gaillot (Inactive)

Reporter:: Michal Nováček

Developer:: Kenneth Gaillot (Inactive)

QA Contact:: Cluster QE

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2024/10/21 10:32 AM

Updated:: 2024/10/24 3:09 PM

Resolved:: 2024/10/24 3:09 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates