-
Bug
-
Resolution: Done-Errata
-
Undefined
-
rhel-8.10, rhel-9.4
-
corosync-3.1.8-2.el9
-
None
-
None
-
ce03c68394517ea8782a03968e2507a1096e9efe
-
rhel-sst-high-availability
-
ssg_filesystems_storage_and_HA
-
12
-
19
-
4
-
False
-
-
No
-
None
-
Pass
-
None
-
None
What were you trying to do that didn't work?
With the fix [1] for RHEL-13109 landing in kronosnet-1.28, when the corosync crypto configuration is changed to a combination that cannot be applied by the reload command due to FIPS being enabled (ie. changing cipher or hash algorithm to a non-FIPS one), corosync will be reporting incorrect (the desired new, but not applied) crypto configuration in the totem.crypto_* keys in the cmap (as viewable using the corosync-cmapctl totem.crypto command), while the old crypto config is still in effect.
The inability for new crypto config to be successfully used for cluster comms is correctly reported in syslog and corosync.log, but only on the first reload attempt. Afterwards, corosync believes that the current and desired crypto configs match (which is not true) and reload becomes a no-op (and thus does not log the knet crypto errors again).
This misconfiguration only becomes apparent when fully (re)starting corosync, as it will refuse to start due to knet being unable to use the requested crypto algorithms.
Trying to go from no encryption to failing-to-apply crypto config still results in the other node(s) getting fenced. While this, thankfully, makes a potentially fatal crypto misconfiguration easier to spot, it suggests the fix for RHEL-13109 was incomplete. I'll report this in a separate issue for the kronosnet component.
Ideally (and I'm aware this would be a huge undertaking), the cluster stack should be able to spot and report any mismatch between the active and stored-on-disk configuration, eg. by showing a prominent warning in the pcs status output, the PCSD Web UI, and in the Red Hat Insights portal.
[1] https://github.com/kronosnet/kronosnet/pull/412
Please provide the package NVR for which bug is seen:
RHEL 8.10: kronosnet-1.28-1.el8 + corosync-3.1.8-1.el8
RHEL 9.4: kronosnet-1.28-1.el9 + corosync-3.1.8-1.el9
How reproducible:
always
Steps to reproduce
[root@virt-123 ~]# corosync-cmapctl totem.crypto totem.crypto_cipher (str) = aes256 totem.crypto_hash (str) = sha256 [root@virt-123 ~]# pcs cluster config update crypto cipher=aes192 hash=md5 model=openssl Sending updated corosync.conf to nodes... virt-123: Succeeded virt-124: Succeeded virt-123: Corosync configuration reloaded [root@virt-123 ~]# tail /var/log/cluster/corosync.log Feb 05 17:44:33 [54306] virt-123 corosync info [TOTEM ] Configuring link 0 Feb 05 17:44:33 [54306] virt-123 corosync info [TOTEM ] Configured link number 0: local addr: 10.37.166.250, port=5405 Feb 05 17:44:33 [54306] virt-123 corosync error [TOTEM ] knet_handle_crypto_set_config (index 2) failed: -2 Feb 05 17:44:33 [54306] virt-123 corosync info [TOTEM ] kronosnet crypto reconfigured on index 2: openssl/aes192/md5 Feb 05 17:44:33 [54306] virt-123 corosync info [KNET ] common: crypto_openssl.so has been loaded from /usr/lib64/kronosnet/crypto_openssl.so Feb 05 17:44:33 [54306] virt-123 corosync error [KNET ] opensslcrypto: Unable to set openssl context parameters: error:0308010C:digital envelope routines::unsupported Feb 05 17:44:33 [54306] virt-123 corosync error [KNET ] crypto: Test of crypt operation failed - unsupported crypto module parameters Feb 05 17:44:33 [54306] virt-123 corosync info [KNET ] pmtud: MTU manually set to: 0 Feb 05 17:44:33 [54306] virt-123 corosync error [TOTEM ] knet_handle_crypto_use_config 2 failed: Invalid argument Feb 05 17:44:33 [54306] virt-123 corosync error [TOTEM ] knet_handle_crypto_set_config to clear index 1 failed: Device or resource busy [root@virt-123 ~]# corosync-cmapctl totem.crypto totem.crypto_cipher (str) = aes192 totem.crypto_hash (str) = md5 totem.crypto_model (str) = openssl [root@virt-123 ~]# corosync-cfgtool -R Reloading corosync.conf... Done [root@virt-123 ~]# echo $? 0 [root@virt-123 ~]# tail /var/log/cluster/corosync.log [...] Feb 05 17:44:45 [54306] virt-123 corosync notice [CFG ] Config reload requested by node 1 Feb 05 17:44:45 [54306] virt-123 corosync info [TOTEM ] Configuring link 0 Feb 05 17:44:45 [54306] virt-123 corosync info [TOTEM ] Configured link number 0: local addr: 10.37.166.250, port=5405 Feb 05 17:44:45 [54306] virt-123 corosync info [KNET ] pmtud: MTU manually set to: 0 [root@virt-123 ~]# pcs cluster stop --all --wait virt-123: Stopping Cluster (pacemaker)... virt-124: Stopping Cluster (pacemaker)... virt-123: Stopping Cluster (corosync)... virt-124: Stopping Cluster (corosync)... [root@virt-123 ~]# pcs cluster start Starting Cluster... Error: Unable to start corosync: Job for corosync.service failed because the control process exited with error code. See "systemctl status corosync.service" and "journalctl -xeu corosync.service" for details. [root@virt-123 ~]# journalctl -u corosync Feb 05 17:45:07 virt-123 systemd[1]: Starting Corosync Cluster Engine... Feb 05 17:45:07 virt-123 corosync[60606]: [MAIN ] Corosync Cluster Engine 3.1.8 starting up Feb 05 17:45:07 virt-123 corosync[60606]: [MAIN ] Corosync built-in features: dbus systemd xmlconf vqsim nozzle snmp pie relro bindnow Feb 05 17:45:07 virt-123 corosync[60606]: [TOTEM ] Initializing transport (Kronosnet). Feb 05 17:45:07 virt-123 corosync[60606]: [TOTEM ] knet_handle_crypto_set_config (index 1) failed: -2 Feb 05 17:45:07 virt-123 corosync[60606]: [KNET ] pmtud: MTU manually set to: 0 Feb 05 17:45:07 virt-123 corosync[60606]: [KNET ] common: crypto_openssl.so has been loaded from /usr/lib64/kronosnet/crypto_openssl.so Feb 05 17:45:07 virt-123 corosync[60606]: [KNET ] opensslcrypto: Unable to set openssl context parameters: error:0308010C:digital envelope routines::unsupported Feb 05 17:45:07 virt-123 corosync[60606]: [KNET ] crypto: Test of crypt operation failed - unsupported crypto module parameters Feb 05 17:45:07 virt-123 corosync[60606]: [MAIN ] Can't initialize TOTEM layer Feb 05 17:45:07 virt-123 corosync[60606]: [MAIN ] Corosync Cluster Engine exiting with status 15 at main.c:1608. Feb 05 17:45:07 virt-123 systemd[1]: corosync.service: Main process exited, code=exited, status=15/n/a Feb 05 17:45:07 virt-123 systemd[1]: corosync.service: Failed with result 'exit-code'. Feb 05 17:45:07 virt-123 systemd[1]: Failed to start Corosync Cluster Engine.
Expected results
Better error reporting on reload when changing crypto configuration fails:
- corosync-cfgtool -R should log an error to stderr and exit non-zero when updated crypto config fails to apply (currently exits 0 with no errors or warnings, admin has to independently notice the errors in syslog/corosync.log)
- the above also means the higher-level pcs cluster config update command is unable to notice the error
- repeated reload attempts of the same un-applicable crypto config should continue logging knet crypto errors instead of appearing as a success
Actual results
Corosync is lying to the cluster admin about which crypto algorithms are in use.
- links to
-
RHBA-2024:135529 corosync update