Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-76173

Improve behavior when updating cluster configuration and corosync fails to reload [rhel-9]

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Normal Normal
    • rhel-9.7
    • None
    • pcs
    • None
    • None
    • rhel-sst-high-availability
    • 17
    • 23
    • 0
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None

      This is a clone of issue RHEL-47300 to use for version rhel-9.7

      Original description:

      Goal

      • Following up on RHEL-24163, corosync is now able to report errors when a new configuration fails to apply. It would be convenient if pcs would be able to revert the original configuration in case the reload of corosync fails.
      • At the current state, when pcs config update fails after corosync cannot be reloaded, corosync continues to run with the original configuration whereas the corosync.conf is already updated to a new one and distributed to nodes. This can result in situations where a node or cluster cannot be restarted and additional configuration updates will fail as well. An example of such misconfiguration can be crypto options that are generally allowed by corosync but cannot be applied due to crypto policies (FIPS):
      # fips-mode-setup --check
      FIPS mode is enabled.
      # pcs cluster config update crypto cipher=aes192 hash=md5 model=openssl
      Sending updated corosync.conf to nodes...
      virt-498: Succeeded
      virt-497: Succeeded
      Warning: virt-497: Unable to reload corosync configuration: Unable to reload corosync configuration: Done
      ERROR from reload: Failed to set knet crypto - see syslog for more information
      Errors in appying config, corosync.conf might not match the running system
      Warning: virt-498: Unable to reload corosync configuration: Unable to reload corosync configuration: Done
      ERROR from reload: Failed to set knet crypto - see syslog for more information
      Errors in appying config, corosync.conf might not match the running system
      Error: Unable to perform operation on any available node/host, therefore it is not possible to continue
      Error: Errors have occurred, therefore pcs is unable to continue 
      # echo $?
      1 
      # pcs cluster stop --all && pcs cluster start --all
      virt-497: Stopping Cluster (pacemaker)...
      virt-498: Stopping Cluster (pacemaker)...
      virt-497: Stopping Cluster (corosync)...
      virt-498: Stopping Cluster (corosync)...
      virt-497: Error connecting to virt-497 - (HTTP error: 400)
      virt-498: Error connecting to virt-498 - (HTTP error: 400)
      Error: unable to start all nodes
      virt-497: Error connecting to virt-497 - (HTTP error: 400)
      virt-498: Error connecting to virt-498 - (HTTP error: 400)

      Acceptance Criteria

      Acceptance criteria might depend on a chosen approach to this issue, which may include keeping in place the original config file if there is an error of corosync reload, or an improvement of the error message informing user what exactly happened and what to do to possibly revert this

              tojeline@redhat.com Tomas Jelinek
              watson-automation Watson Automation
              Tomas Jelinek Tomas Jelinek
              Cluster QE Cluster QE
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: