-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
rhel-9.4.z
-
None
-
No
-
Moderate
-
ZStream
-
rhel-sst-filesystems
-
ssg_filesystems_storage_and_HA
-
5
-
False
-
-
None
-
Red Hat Enterprise Linux
-
None
-
None
-
None
-
-
x86_64
-
None
TLDR;
Netapp is performing HA tests with ktls enabled. When moving the LIF (IP) to another Host, they encounter IO hang with the backtrace shown below. When they do the test without encryption, it works.
Please provide the package NVR for which the bug is seen:
- ktls-utils-0.11-1.el9_4.x86_64
- kernel 5.14.0-427.35.1.el9_4.x86_64
- ONTAP 9.15.1 GA
How reproducible is this bug?:
100%
Steps to reproduce
1. Configure TLS in ONTAP and RHEL 9.4 client
2. mount the nfs share using NFS TLS
3. Start copy using Linux cp command from local directory to share point
cp -rf /home/test_data/* /mnt/nfs_share1/
4. Migrate the lifs from Node-1 to Node-2 to a different node.
5. Copy hangs. Needs a reboot to recover.
Actual results
> [16957.040410] INFO: task cp:7183 blocked for more than 122 seconds.
> [16957.040669] Not tainted 5.14.0-427.35.1.el9_4.x86_64 #1
> [16957.040786] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [16957.040947] task:cp state stack:0 pid:7183 ppid:7066 flags:0x00004004
> [16957.040956] Call Trace:
> [16957.040958] <TASK>
> [16957.040964] __schedule+0x21b/0x550
> [16957.041016] schedule+0x2d/0x70
> [16957.041018] io_schedule+0x42/0x70
> [16957.041020] folio_wait_bit+0xe9/0x200
> [16957.041052] ? find_get_pages_range_tag+0x199/0x1e0
> [16957.041056] ? __pfx_wake_page_function+0x10/0x10
> [16957.041059] folio_wait_writeback+0x28/0x80
> [16957.041068] __filemap_fdatawait_range+0x7b/0x110
> [16957.041079] filemap_write_and_wait_range+0x88/0xb0
> [16957.041093] nfs_wb_all+0x22/0x130 [nfs]
> [16957.041301] nfs_file_flush+0x63/0x80 [nfs]
> [16957.041333] filp_close+0x2f/0x70
> [16957.041371] __x64_sys_close+0xd/0x50
> [16957.041373] do_syscall_64+0x59/0x90
> [16957.041386] ? syscall_exit_work+0x103/0x130
> [16957.041413] ? syscall_exit_to_user_mode+0x22/0x40
> [16957.041416] ? do_syscall_64+0x69/0x90
> [16957.041418] ? do_syscall_64+0x69/0x90
> [16957.041427] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [16957.041442] RIP: 0033:0x7fc936e234a4
> [16957.041565] RSP: 002b:00007ffd2ebb91c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
> [16957.041572] RAX: ffffffffffffffda RBX: 0000000000000010 RCX: 00007fc936e234a4
> [16957.041574] RDX: 00007ffd2ebb9660 RSI: 0000000000000004 RDI: 0000000000000004
> [16957.041575] RBP: 00007ffd2ebb9450 R08: 0000000000000000 R09: 0000000000000000
> [16957.041577] R10: 00000000016bb010 R11: 0000000000000246 R12: 0000000000402950
> [16957.041578] R13: 00007ffd2ebba180 R14: 0000000000000000 R15: 0000000000000000
> [16957.041582] </TASK>
- is cloned by
-
RHEL-67303 NFS with TLS: Dropping to TCP Zero Window against NetApp Server
- New
-
RHEL-67304 NFS with TLS: Hang during TLS handshake when performing LIF migration (HA failover)
- New