Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-98672

gnutls corrupts session state with multiple threads due to TLS 1.3 rekeying

Linking RHIVOS CVEs to...Migration: Automation ...Sync from "Extern...XMLWordPrintable

    • gnutls-3.8.10-2.el10
    • No
    • Low
    • ZStream
    • 1
    • rhel-security-crypto-spades
    • 26
    • 2
    • False
    • False
    • Hide

      None

      Show
      None
    • No
    • Crypto25August
    • Regression Exception
    • Hide
      1. gnutls can successfully transfer 1 TB one way
      2. gnutls can re-key 20 times during a successful transfer
      3. reproducer provided upstream passes

      [/CoreOS/gnutls/Regression/RHEL-98672-rekeying-corrupts-state-TLS1-3, /CoreOS/gnutls/Sanity/large-transfer]

      Show
      gnutls can successfully transfer 1 TB one way gnutls can re-key 20 times during a successful transfer reproducer provided upstream passes [/CoreOS/gnutls/Regression/RHEL-98672-rekeying-corrupts-state-TLS1-3, /CoreOS/gnutls/Sanity/large-transfer]
    • Pass
    • Automated
    • Unspecified Release Note Type - Unknown
    • Unspecified
    • Unspecified
    • Unspecified
    • None

      What were you trying to do that didn't work?

      The GNUTLS docs about thread safety have a list of conditions which, if satisfied, are expected to allow a gnutls_session_t object to be used for concurrent I/O from 2 threads (ie for parallel gnutls_record_send & gnutls_record_recv, but nothing else):

      https://gnutls.org/manual/gnutls.html#Thread-safety

      QEMU follows this guidance, but despite that, we find that GNUTLS will corrupt session state due to TLS 1.3 automatic re-keying

      This same issue, with greater details is also reported upstream:

      https://gitlab.com/gnutls/gnutls/-/issues/1717

      What is the impact of this issue to you?

      When QEMU live migration has its "postcopy" feature enabled, it uses a bidirectional TCP channel and thus when TLS is enabled for migration, QEMU wll have two threads using the gnutls_session_t - one sending and one receiving.

       

      If the live migration stays in "precopy" phase for long enough a TLS 1.3 rekey will get initiated. When QEMU is then told to switch to "postcopy" phase triggering recvs on the source QEMU, the TLS session will fail with "Decryption has failed."

      Please provide the package NVR for which the bug is seen:

      gnutls-3.8.3-6.el9.x86_64

      How reproducible is this bug?:

      Slightly non-deterministic, but usually possible to trigger when TLS 1.3 is enabled with a cipher that requires rekeying

      Steps to reproduce

      1. Save the demo program tlsrekey.c
      2. gcc -o tlsrekey tlsrekey.c -lgnutls -DSERVER_MANUAL_REKEY=1
      3. ./tlsrekey

      As an alternative to "SERVER_MANUAL_REKEY", either "SERVER_AUTO_REKEY" or "CLIENT_AUTO_REKEY" or both can be defined at build time - this is much slower to reproduce as auto-rekeying waits for 16million records, where as the demo program manual rekeying triggers every 2 seconds. The errors observed as slightly different with CLIENT_AUTO_REKEY but likely the same root cause.

      Expected results

      1879: client sender
      1880: server echo
      1878: client receiver
      1880: manual key update
      1880: manual key update
      1880: manual key update
      1880: manual key update
      ..keying without errors forever..
      

       

      Actual results

      1879: client sender
      1880: server echo
      1878: client receiver
      1880: manual key update
      1880: echo recv: Decryption has failed.
      

       

              dueno@redhat.com Daiki Ueno
              rhn-engineering-berrange Daniel Berrangé
              Daiki Ueno Daiki Ueno
              Alexander Sosedkin Alexander Sosedkin
              Votes:
              1 Vote for this issue
              Watchers:
              16 Start watching this issue

                Created:
                Updated: