Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-24163

Inconsistent crypto config reload with kronosnet-1.28

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • rhel-9.5
    • rhel-8.10, rhel-9.4
    • corosync
    • corosync-3.1.8-2.el9
    • None
    • None
    • ce03c68394517ea8782a03968e2507a1096e9efe
    • rhel-sst-high-availability
    • ssg_filesystems_storage_and_HA
    • 12
    • 19
    • 4
    • False
    • Hide

      None

      Show
      None
    • No
    • None
    • None

      What were you trying to do that didn't work?

      With the fix [1] for RHEL-13109 landing in kronosnet-1.28, when the corosync crypto configuration is changed to a combination that cannot be applied by the reload command due to FIPS being enabled (ie. changing cipher or hash algorithm to a non-FIPS one), corosync will be reporting incorrect (the desired new, but not applied) crypto configuration in the totem.crypto_* keys in the cmap (as viewable using the corosync-cmapctl totem.crypto command), while the old crypto config is still in effect.

      The inability for new crypto config to be successfully used for cluster comms is correctly reported in syslog and corosync.log, but only on the first reload attempt. Afterwards, corosync believes that the current and desired crypto configs match (which is not true) and reload becomes a no-op (and thus does not log the knet crypto errors again).

      This misconfiguration only becomes apparent when fully (re)starting corosync, as it will refuse to start due to knet being unable to use the requested crypto algorithms.

      Trying to go from no encryption to failing-to-apply crypto config still results in the other node(s) getting fenced. While this, thankfully, makes a potentially fatal crypto misconfiguration easier to spot, it suggests the fix for RHEL-13109 was incomplete. I'll report this in a separate issue for the kronosnet component.

      Ideally (and I'm aware this would be a huge undertaking), the cluster stack should be able to spot and report any mismatch between the active and stored-on-disk configuration, eg. by showing a prominent warning in the pcs status output, the PCSD Web UI, and in the Red Hat Insights portal.

      [1] https://github.com/kronosnet/kronosnet/pull/412

      Please provide the package NVR for which bug is seen:

      RHEL 8.10: kronosnet-1.28-1.el8 + corosync-3.1.8-1.el8
      RHEL 9.4: kronosnet-1.28-1.el9 + corosync-3.1.8-1.el9

      How reproducible:

      always

      Steps to reproduce

      [root@virt-123 ~]# corosync-cmapctl totem.crypto
      totem.crypto_cipher (str) = aes256
      totem.crypto_hash (str) = sha256
      [root@virt-123 ~]# pcs cluster config update crypto cipher=aes192 hash=md5 model=openssl
      Sending updated corosync.conf to nodes...
      virt-123: Succeeded
      virt-124: Succeeded
      virt-123: Corosync configuration reloaded
      [root@virt-123 ~]# tail /var/log/cluster/corosync.log 
      Feb 05 17:44:33 [54306] virt-123 corosync info    [TOTEM ] Configuring link 0
      Feb 05 17:44:33 [54306] virt-123 corosync info    [TOTEM ] Configured link number 0: local addr: 10.37.166.250, port=5405
      Feb 05 17:44:33 [54306] virt-123 corosync error   [TOTEM ] knet_handle_crypto_set_config (index 2) failed: -2
      Feb 05 17:44:33 [54306] virt-123 corosync info    [TOTEM ] kronosnet crypto reconfigured on index 2: openssl/aes192/md5
      Feb 05 17:44:33 [54306] virt-123 corosync info    [KNET  ] common: crypto_openssl.so has been loaded from /usr/lib64/kronosnet/crypto_openssl.so
      Feb 05 17:44:33 [54306] virt-123 corosync error   [KNET  ] opensslcrypto: Unable to set openssl context parameters: error:0308010C:digital envelope routines::unsupported
      Feb 05 17:44:33 [54306] virt-123 corosync error   [KNET  ] crypto: Test of crypt operation failed - unsupported crypto module parameters
      Feb 05 17:44:33 [54306] virt-123 corosync info    [KNET  ] pmtud: MTU manually set to: 0
      Feb 05 17:44:33 [54306] virt-123 corosync error   [TOTEM ] knet_handle_crypto_use_config 2 failed: Invalid argument
      Feb 05 17:44:33 [54306] virt-123 corosync error   [TOTEM ] knet_handle_crypto_set_config to clear index 1 failed: Device or resource busy
      [root@virt-123 ~]# corosync-cmapctl totem.crypto
      totem.crypto_cipher (str) = aes192
      totem.crypto_hash (str) = md5
      totem.crypto_model (str) = openssl
      [root@virt-123 ~]# corosync-cfgtool -R
      Reloading corosync.conf...
      Done
      [root@virt-123 ~]# echo $?
      0
      [root@virt-123 ~]# tail /var/log/cluster/corosync.log
      [...]
      Feb 05 17:44:45 [54306] virt-123 corosync notice  [CFG   ] Config reload requested by node 1
      Feb 05 17:44:45 [54306] virt-123 corosync info    [TOTEM ] Configuring link 0
      Feb 05 17:44:45 [54306] virt-123 corosync info    [TOTEM ] Configured link number 0: local addr: 10.37.166.250, port=5405
      Feb 05 17:44:45 [54306] virt-123 corosync info    [KNET  ] pmtud: MTU manually set to: 0
      [root@virt-123 ~]# pcs cluster stop --all --wait
      virt-123: Stopping Cluster (pacemaker)...
      virt-124: Stopping Cluster (pacemaker)...
      virt-123: Stopping Cluster (corosync)...
      virt-124: Stopping Cluster (corosync)...
      [root@virt-123 ~]# pcs cluster start
      Starting Cluster...
      Error: Unable to start corosync: Job for corosync.service failed because the control process exited with error code.
      See "systemctl status corosync.service" and "journalctl -xeu corosync.service" for details.
      [root@virt-123 ~]# journalctl -u corosync
      Feb 05 17:45:07 virt-123 systemd[1]: Starting Corosync Cluster Engine...
      Feb 05 17:45:07 virt-123 corosync[60606]:   [MAIN  ] Corosync Cluster Engine 3.1.8 starting up
      Feb 05 17:45:07 virt-123 corosync[60606]:   [MAIN  ] Corosync built-in features: dbus systemd xmlconf vqsim nozzle snmp pie relro bindnow
      Feb 05 17:45:07 virt-123 corosync[60606]:   [TOTEM ] Initializing transport (Kronosnet).
      Feb 05 17:45:07 virt-123 corosync[60606]:   [TOTEM ] knet_handle_crypto_set_config (index 1) failed: -2
      Feb 05 17:45:07 virt-123 corosync[60606]:   [KNET  ] pmtud: MTU manually set to: 0
      Feb 05 17:45:07 virt-123 corosync[60606]:   [KNET  ] common: crypto_openssl.so has been loaded from /usr/lib64/kronosnet/crypto_openssl.so
      Feb 05 17:45:07 virt-123 corosync[60606]:   [KNET  ] opensslcrypto: Unable to set openssl context parameters: error:0308010C:digital envelope routines::unsupported
      Feb 05 17:45:07 virt-123 corosync[60606]:   [KNET  ] crypto: Test of crypt operation failed - unsupported crypto module parameters
      Feb 05 17:45:07 virt-123 corosync[60606]:   [MAIN  ] Can't initialize TOTEM layer
      Feb 05 17:45:07 virt-123 corosync[60606]:   [MAIN  ] Corosync Cluster Engine exiting with status 15 at main.c:1608.
      Feb 05 17:45:07 virt-123 systemd[1]: corosync.service: Main process exited, code=exited, status=15/n/a
      Feb 05 17:45:07 virt-123 systemd[1]: corosync.service: Failed with result 'exit-code'.
      Feb 05 17:45:07 virt-123 systemd[1]: Failed to start Corosync Cluster Engine.
      

      Expected results

      Better error reporting on reload when changing crypto configuration fails:

      • corosync-cfgtool -R should log an error to stderr and exit non-zero when updated crypto config fails to apply (currently exits 0 with no errors or warnings, admin has to independently notice the errors in syslog/corosync.log)
      • the above also means the higher-level pcs cluster config update command is unable to notice the error
      • repeated reload attempts of the same un-applicable crypto config should continue logging knet crypto errors instead of appearing as a success

      Actual results

      Corosync is lying to the cluster admin about which crypto algorithms are in use.

              rhn-support-phagara Patrik Hagara
              rhn-support-phagara Patrik Hagara
              Jan Friesse Jan Friesse
              Patrik Hagara Patrik Hagara
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: