Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-7601

fence_aws: fenced node always performs a graceful shutdown

    • None
    • Important
    • rhel-sst-high-availability
    • ssg_filesystems_storage_and_HA
    • 8
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • If docs needed, set a value
    • None

      Description of problem:

      Neither a delay attribute nor a pcmk_delay_max attribute properly prevents a fence race in an AWS Pacemaker cluster.

      This may be because the EC2 StopInstances endpoint initiates a graceful shutdown rather than a hard reboot. I have yet to find a way to perform a hard power-off of an EC2 instance.

      https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_StopInstances.html


      Version-Release number of selected component (if applicable):

      fence-agents-aws-4.2.1-30.el8_1.1.noarch
      pacemaker-2.0.2-3.el8.x86_64


      How reproducible:

      Always so far. The issue may be avoidable with a long enough delay, but that's not practical in production.


      Steps to Reproduce:
      1. In a two-node cluster, configure one fence_aws stonith device for each node. This issue can be reproduced with a shared stonith device and pcmk_delay_max, but a static delay can be configured for a single node this way for consistency.

      2. Set delay=60 on one of the stonith devices.

      3. Simulate a heartbeat network failure between nodes.


      Actual results:

      Only the node without the delay gets fenced.


      Expected results:

      Both nodes get fenced. The node with the delay does not get rebooted until after its delay expires.


      Additional info:

      This issue was reported by a user working for AWS. They've stated, "I'm more than happy to work side by side with Red Hat development to test and help to provide code for an improved agent version."

              rhn-engineering-oalbrigt Oyvind Albrigtsen
              rhn-support-nwahl Reid Wahl
              Oyvind Albrigtsen Oyvind Albrigtsen
              Cluster QE Cluster QE
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

                Created:
                Updated: