Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-67304

NFS with TLS: Hang during TLS handshake when performing LIF migration (HA failover)

Linking RHIVOS CVEs to...Migration: Automation ...Sync from "Extern...XMLWordPrintable

    • kernel-5.14.0-555.el9
    • No
    • Moderate
    • ZStream
    • rhel-sst-filesystems
    • rhel-sst-filesystems
    • 22
    • 24
    • 5
    • False
    • Hide

      None

      Show
      None
    • None
    • Red Hat Enterprise Linux
    • None
    • x86_64
    • None

      TLDR;

      Netapp is performing HA tests with TLSenabled. When moving the LIF (IP) to another Host, they encounter IO hang with the backtrace shown below. When they do the test without encryption, it works.

      Please provide the package NVR for which the bug is seen:

      • ktls-utils-0.11-1.el9_4.x86_64
      • kernel 5.14.0-427.35.1.el9_4.x86_64
      • ONTAP 9.15.1 GA

      How reproducible is this bug?:

      100%

      Steps to reproduce

          1.  Configure TLS in ONTAP and RHEL 9.4 client
          2.  mount the nfs share using NFS TLS
          3.  Migrate the lifs from Node-1 to Node-2 during mount
      Actual results

      I also ran into another issue. If a LIF migrate happens exactly when the TLS handshake is going on from the client, the handshake fails. It appears that the client is not retrying the handshake causing mount to hang:

      Console output:
      
      root@scs000379747:~/dhairesh_tests #cp 1gb /mnt
      tlshd[32566]: Handshake with 10.224.118.192 (10.224.118.192) was successful
      tlshd[32573]: Handshake with 10.224.118.192 (10.224.118.192) was successful
      tlshd[32576]: Handshake with 10.224.118.192 (10.224.118.192) was successful
      tlshd[32577]: Handshake with 10.224.118.192 (10.224.118.192) was successful
      tlshd[32578]: gnutls: The TLS connection was non-properly terminated. (-110)
      tlshd[32578]: Handshake with '10.224.118.192' (10.224.118.192) failed                                                                 cp: failed to close '/mnt/1gb': Interrupted system call
      

      Let me know if you need a packet trace for this one. The last few frames are like this:

      110067 2024-11-11 04:45:48.286040486 10.224.118.192 → 10.235.229.141 TCP 60 2049 → 832 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
      110068 2024-11-11 04:45:48.286290188 10.235.229.141 → 10.224.118.192 TCP 74 869 → 2049 [SYN] Seq=0 Win=32120 Len=0 MSS=1460 SACK_PERM=1 TSval=1966037677 TSecr=0 WS=128
      110069 2024-11-11 04:45:48.286550228 10.224.118.192 → 10.235.229.141 TCP 60 2049 → 869 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
      110070 2024-11-11 04:45:48.286951801 10.235.229.141 → 10.224.118.192 TCP 74 790 → 2049 [SYN] Seq=0 Win=32120 Len=0 MSS=1460 SACK_PERM=1 TSval=1966037677 TSecr=0 WS=128
      110071 2024-11-11 04:45:49.313642573 10.235.229.141 → 10.224.118.192 TCP 74 [TCP Retransmission] 790 → 2049 [SYN] Seq=0 Win=32120 Len=0 MSS=1460 SACK_PERM=1 TSval=1966038704 TSecr=0 WS=128
      110072 2024-11-11 04:45:49.314175007 10.224.118.192 → 10.235.229.141 TCP 74 2049 → 790 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256 SACK_PERM=1 TSval=3527665269 TSecr=1966038704
      110073 2024-11-11 04:45:49.314296915 10.235.229.141 → 10.224.118.192 TCP 66 790 → 2049 [ACK] Seq=1 Ack=1 Win=32128 Len=0 TSval=1966038705 TSecr=3527665269
      110074 2024-11-11 04:45:49.314380861 10.235.229.141 → 10.224.118.192 NFS 110 V3 NULL Call
      110075 2024-11-11 04:45:49.315091418 10.224.118.192 → 10.235.229.141 NFS 102 V3 NULL Reply (Call In 110074)
      110076 2024-11-11 04:45:49.315114074 10.235.229.141 → 10.224.118.192 TCP 66 790 → 2049 [ACK] Seq=45 Ack=37 Win=32128 Len=0 TSval=1966038705 TSecr=3527665269
      110077 2024-11-11 04:45:49.428898720 10.235.229.141 → 10.224.118.192 RPC 423 Continuation
      110078 2024-11-11 04:45:49.452339388 10.224.118.192 → 10.235.229.141 TCP 66 2049 → 790 [RST, ACK] Seq=37 Ack=402 Win=0 Len=0 TSval=3527665399 TSecr=1966038819                                                                              ^[[D^[[C110079 2024-11-11 05:36:12.377102671 10.235.229.141 → 10.224.118.192 Portmap 98 V2 GETPORT Call MOUNT(100005) V:3 UDP
      110080 2024-11-11 05:36:12.377812568 10.224.118.192 → 10.235.229.141 Portmap 70 V2 GETPORT Reply (Call In 110079) Port:635
      110081 2024-11-11 05:36:12.377941581 10.235.229.141 → 10.224.118.192 MOUNT 82 V3 NULL Call
      110082 2024-11-11 05:36:12.378609690 10.224.118.192 → 10.235.229.141 MOUNT 66 V3 NULL Reply (Call In 110081)
      110083 2024-11-11 05:36:12.378811223 10.235.229.141 → 10.224.118.192 MOUNT 130 V3 UMNT Call /data
      110084 2024-11-11 05:36:12.379505956 10.224.118.192 → 10.235.229.141 MOUNT 66 V3 UMNT Reply (Call In 110083)
      110085 2024-11-11 05:36:13.756411344 10.235.229.141 → 10.224.118.192 Portmap 98 V2 GETPORT Call MOUNT(100005) V:3 UDP
      110086 2024-11-11 05:36:13.757228673 10.224.118.192 → 10.235.229.141 Portmap 70 V2 GETPORT Reply (Call In 110085) Port:635
      110087 2024-11-11 05:36:13.757384437 10.235.229.141 → 10.224.118.192 MOUNT 82 V3 NULL Call
      110088 2024-11-11 05:36:13.758118896 10.224.118.192 → 10.235.229.141 MOUNT 66 V3 NULL Reply (Call In 110087)
      110089 2024-11-11 05:36:13.758405397 10.235.229.141 → 10.224.118.192 MOUNT 130 V3 UMNT Call /data
      110090 2024-11-11 05:36:13.758979219 10.224.118.192 → 10.235.229.141 MOUNT 66 V3 UMNT Reply (Call In 110089)
      

       

       

        1. forRHWithPatch00
          100.00 MB
        2. forRHWithPatch01
          100.00 MB
        3. forRHWithPatch02
          64.17 MB

              bcodding@redhat.com Benjamin Coddington
              nilskoenigrh Nils Koenig
              Olga Kornievskaia Olga Kornievskaia
              Yongcheng Yang Yongcheng Yang
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: