This is a clone of issue RHEL-47300 to use for version rhel-9.7
–
Original description:
Goal
- Following up on
RHEL-24163, corosync is now able to report errors when a new configuration fails to apply. It would be convenient if pcs would be able to revert the original configuration in case the reload of corosync fails. - At the current state, when pcs config update fails after corosync cannot be reloaded, corosync continues to run with the original configuration whereas the corosync.conf is already updated to a new one and distributed to nodes. This can result in situations where a node or cluster cannot be restarted and additional configuration updates will fail as well. An example of such misconfiguration can be crypto options that are generally allowed by corosync but cannot be applied due to crypto policies (FIPS):
# fips-mode-setup --check FIPS mode is enabled. # pcs cluster config update crypto cipher=aes192 hash=md5 model=openssl Sending updated corosync.conf to nodes... virt-498: Succeeded virt-497: Succeeded Warning: virt-497: Unable to reload corosync configuration: Unable to reload corosync configuration: Done ERROR from reload: Failed to set knet crypto - see syslog for more information Errors in appying config, corosync.conf might not match the running system Warning: virt-498: Unable to reload corosync configuration: Unable to reload corosync configuration: Done ERROR from reload: Failed to set knet crypto - see syslog for more information Errors in appying config, corosync.conf might not match the running system Error: Unable to perform operation on any available node/host, therefore it is not possible to continue Error: Errors have occurred, therefore pcs is unable to continue # echo $? 1 # pcs cluster stop --all && pcs cluster start --all virt-497: Stopping Cluster (pacemaker)... virt-498: Stopping Cluster (pacemaker)... virt-497: Stopping Cluster (corosync)... virt-498: Stopping Cluster (corosync)... virt-497: Error connecting to virt-497 - (HTTP error: 400) virt-498: Error connecting to virt-498 - (HTTP error: 400) Error: unable to start all nodes virt-497: Error connecting to virt-497 - (HTTP error: 400) virt-498: Error connecting to virt-498 - (HTTP error: 400)
Acceptance Criteria
Acceptance criteria might depend on a chosen approach to this issue, which may include keeping in place the original config file if there is an error of corosync reload, or an improvement of the error message informing user what exactly happened and what to do to possibly revert this
- clones
-
RHEL-47300 Improve behavior when updating cluster configuration and corosync fails to reload [rhel-10]
-
- Planning
-