-
Bug
-
Resolution: Obsolete
-
Normal
-
None
-
rhel-9.0.0
-
None
-
Moderate
-
rhel-sst-storage-io
-
ssg_filesystems_storage_and_HA
-
1
-
False
-
-
None
-
None
-
None
-
None
-
If docs needed, set a value
-
-
All
-
None
Description of problem:
Request timeout issue with NVMe/TCP under high IO load
Version-Release number of selected component (if applicable):
RHEL 9.0 / 5.14.0-70.17.1.el9_0.x86_64
How reproducible:
Always
Steps to Reproduce:
1. RHEL 9 server connected to a storage product via the NVMe/TCP protocol.
2. Customer used spdk nvme target
3. Then start high IO load
Actual results:
Nvme-tcp connection issue
Expected results:
It should work normally under high IO load
Additional info:
~~~
[ 121.459789] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 172.20.53.135:4420
[ 121.460298] nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
[ 121.469848] nvme nvme0: creating 24 I/O queues.
[ 121.708457] nvme nvme0: mapped 24/0/0 default/read/poll queues.
[ 121.715034] nvme nvme0: new ctrl: NQN "nqn.2020-01.com.infinidat:20015-subsystem-19988", addr 172.XX.XX.XXX:4420
[ 121.730466] nvme nvme1: creating 24 I/O queues.
[ 121.968453] nvme nvme1: mapped 24/0/0 default/read/poll queues.
[ 121.974964] nvme nvme1: new ctrl: NQN "nqn.2020-01.com.infinidat:20015-subsystem-19988", addr 172.XX.XX.XX:4420
[ 2609.736192] nvme nvme0: queue 22: timeout request 0x38 type 4
[ 2609.736201] nvme nvme0: starting error recovery
[ 2609.737241] nvme nvme0: Reconnecting in 10 seconds...
[ 2619.987225] nvme nvme0: creating 24 I/O queues.
[ 2620.258490] nvme nvme0: mapped 24/0/0 default/read/poll queues.
[ 2620.263160] nvme nvme0: Successfully reconnected (1 attempt)
[ 2994.751516] nvme nvme1: queue 8: timeout request 0x6 type 4
[ 2994.751525] nvme nvme1: starting error recovery
[ 2994.752560] nvme nvme1: Reconnecting in 10 seconds...
[ 3005.004597] nvme nvme1: creating 24 I/O queues.
[ 3005.274160] nvme nvme1: mapped 24/0/0 default/read/poll queues.
[ 3005.274290] debugfs: Directory 'hctx0' with parent '/' already present!
[ 3005.274298] debugfs: Directory 'hctx1' with parent '/' already present!
[ 3005.274302] debugfs: Directory 'hctx2' with parent '/' already present!
[ 3005.274304] debugfs: Directory 'hctx3' with parent '/' already present!
[ 3005.274307] debugfs: Directory 'hctx4' with parent '/' already present!
[ 3005.274310] debugfs: Directory 'hctx5' with parent '/' already present!
[ 3005.274312] debugfs: Directory 'hctx6' with parent '/' already present!
[ 3005.274316] debugfs: Directory 'hctx7' with parent '/' already present!
[ 3005.274319] debugfs: Directory 'hctx8' with parent '/' already present!
[ 3005.274321] debugfs: Directory 'hctx9' with parent '/' already present!
[ 3005.274324] debugfs: Directory 'hctx10' with parent '/' already present!
[ 3005.274327] debugfs: Directory 'hctx11' with parent '/' already present!
[ 3005.274330] debugfs: Directory 'hctx12' with parent '/' already present!
[ 3005.274333] debugfs: Directory 'hctx13' with parent '/' already present!
[ 3005.274336] debugfs: Directory 'hctx14' with parent '/' already present!
[ 3005.274339] debugfs: Directory 'hctx15' with parent '/' already present!
[ 3005.274342] debugfs: Directory 'hctx16' with parent '/' already present!
[ 3005.274344] debugfs: Directory 'hctx17' with parent '/' already present!
[ 3005.274348] debugfs: Directory 'hctx18' with parent '/' already present!
[ 3005.274351] debugfs: Directory 'hctx19' with parent '/' already present!
[ 3005.274354] debugfs: Directory 'hctx20' with parent '/' already present!
[ 3005.274357] debugfs: Directory 'hctx21' with parent '/' already present!
[ 3005.274360] debugfs: Directory 'hctx22' with parent '/' already present!
[ 3005.274362] debugfs: Directory 'hctx23' with parent '/' already present!
[ 3005.277796] nvme nvme1: Successfully reconnected (1 attempt)
[ 3064.893442] nvme nvme0: queue 18: timeout request 0x37 type 4
[ 3064.893451] nvme nvme0: starting error recovery
[ 3064.894448] nvme nvme0: Reconnecting in 10 seconds...
[ 3075.144327] nvme nvme0: creating 24 I/O queues.
[ 3075.402443] nvme nvme0: mapped 24/0/0 default/read/poll queues.
[ 3075.407243] nvme nvme0: Successfully reconnected (1 attempt)
[ 3005.277796] nvme nvme1: Successfully reconnected (1 attempt)
[ 3064.893442] nvme nvme0: queue 18: timeout request 0x37 type 4
[ 3064.893451] nvme nvme0: starting error recovery
[ 3064.894448] nvme nvme0: Reconnecting in 10 seconds...
[ 3075.144327] nvme nvme0: creating 24 I/O queues.
[ 3075.402443] nvme nvme0: mapped 24/0/0 default/read/poll queues.
[ 3075.407243] nvme nvme0: Successfully reconnected (1 attempt)
..
[16392.473564] nvme nvme0: starting error recovery
[16392.474660] nvme nvme0: Reconnecting in 10 seconds...
[16402.722441] nvme nvme0: creating 24 I/O queues.
[16402.987520] nvme nvme0: mapped 24/0/0 default/read/poll queues.
[16402.993473] nvme nvme0: Successfully reconnected (1 attempt)
[17737.466878] nvme nvme1: queue 23: timeout request 0x8 type 4
[17737.466886] nvme nvme1: starting error recovery
[17737.467932] nvme nvme1: Reconnecting in 10 seconds...
[17747.714022] nvme nvme1: creating 24 I/O queues.
[17747.981289] nvme nvme1: mapped 24/0/0 default/read/poll queues.
..
[44207.566992] debugfs: Directory 'hctx22' with parent '/' already present!
[44207.566995] debugfs: Directory 'hctx23' with parent '/' already present!
[44207.571479] nvme nvme1: Successfully reconnected (1 attempt)
[44910.247638] nvme nvme0: queue 13: timeout request 0x5b type 4
[44910.247646] nvme nvme0: starting error recovery
[44910.248687] nvme nvme0: Reconnecting in 10 seconds...
[44920.496424] nvme nvme0: creating 24 I/O queues.
[44920.754444] nvme nvme0: mapped 24/0/0 default/read/poll queues.
[44920.760197] nvme nvme0: Successfully reconnected (1 attempt)
[49347.144379] nvme nvme1: queue 15: timeout request 0x52 type 4
[49347.144386] nvme nvme1: starting error recovery
[49347.145416] nvme nvme1: Reconnecting in 10 seconds...
[49357.402221] nvme nvme1: creating 24 I/O queues.
[49357.671423] nvme nvme1: mapped 24/0/0 default/read/poll queues.
[49357.671527] debugfs: Directory 'hctx0' with parent '/' already present!
~~~
This patch set looks relevant as the customer confirmed it's under high IO load
https://lore.kernel.org/all/20220519062617.39715-1-hare@suse.de/
https://lore.kernel.org/all/20220519062617.39715-2-hare@suse.de/
https://lore.kernel.org/all/20220519062617.39715-3-hare@suse.de/
https://lore.kernel.org/all/20220519062617.39715-4-hare@suse.de/