Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-63171

Docker resource agent always fails to stop when there is no docker running

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Minor Minor
    • None
    • rhel-7.9.z
    • pacemaker
    • None
    • No
    • Moderate
    • rhel-sst-high-availability
    • ssg_filesystems_storage_and_HA
    • 1
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None

      What were you trying to do that didn't work?
      I found very unpleasant situation with ha cluster involving dockerd and
      container resources, in seemingly correct configuration, causing infinite
      reboot of secondary node.

      What is the impact of this issue to you?
      Infinite reboot of secondary nodes, causing outages of resources being tryied
      to be run and failing once the secondary node comes back.

      How reproducible is this bug?

      Always at the time, cannot easily reproduce again as the cluster is production.

      Steps to reproduce:

      This happened on production cluster, having RHEL7.9

      • two node cluster with systemd dockerd resource and multiple container resources having with/after dependency on dockerd, everything running correctly
      • both nodes forcefully restart
      • one of the nodes comes up correctly becoming DC
      • the other node not being able to start docker ending up in a endless loop because of the container resources failng to stop
      • shutting off the secondary node, setting it to standby on DC hopping it will come back up as it "would not host any resources"
      • powering it on to no changed of the sitution
      • putting the secondary node to maintenance to allow it to start
      • starting dockerd on secondary node
      • un-maintenance secondary node, everything runs correctly

      Expected result:
      Container resource not failing stop action when there is no docker (socket or
      remote). I know I can set this manually but if that is to be recommendation it
      should be the default for container resource.

      Actual results: 
      Infinite reboot of secondary nodes, causing outages of resources being tryied
      to be run and failing once the secondary node comes back.

      Other info:

      My guess would be that this should be fixed in resource-agents but putting on pacemaker for the case that is to be solved differently.

              kgaillot@redhat.com Kenneth Gaillot
              mnovacek@redhat.com Michal Nováček
              Kenneth Gaillot Kenneth Gaillot
              Cluster QE Cluster QE
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: