Loading...

XML

Word

Printable

Type: Bug
Resolution: Obsolete
Priority: Normal
Fix Version/s: None
Affects Version/s: rhel-8.10
Component/s: kernel / Storage / Storage Drivers
Labels:
- FCP_NETAPP
- FCP_P0

Regression:
None
Severity:
Low

Pool Team:

rhel-sst-storage-io
Sub-System Group:

ssg_filesystems_storage_and_HA

Story Points:
3
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Sprint:
None

Preliminary Testing:
None
Test Coverage:
None

Experience:
Architecture:

x86_64

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

While running pNATE for RHEL-8.10, the test failed during path validation.

After a failover and giveback, plate-client-03 is only reporting 3 paths instead of 4 to the NVMe-FC namespaces:

nvme list-subsys /dev/nvme2n1
nvme-subsys2 - NQN=nqn.1992-08.com.netapp:sn.dd2bb30cfa2a11ed8f2400a098cbcac6:subsystem.nvme_1
\
+- nvme0 fc traddr=nn-0x211600a098cbcac6:pn-0x213b00a098cbcac6 host_traddr=nn-0x2000f4c7aa065db5:pn-0x2100f4c7aa065db5 live optimized
+- nvme1 fc traddr=nn-0x211600a098cbcac6:pn-0x207d00a098cbcac6 host_traddr=nn-0x2000f4c7aa065db5:pn-0x2100f4c7aa065db5 live non-optimized
+- nvme4 fc traddr=nn-0x211600a098cbcac6:pn-0x200900a098cbcac6 host_traddr=nn-0x2000f4c7aa065db4:pn-0x2100f4c7aa065db4 live optimized

No issues with the FC LUNs using the same initiator ports:

#multipath -ll:
3600a098038304267573f4d3778506432 dm-19 NETAPP,LUN C-Mode
size=80G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw

+ policy='service-time 0' prio=50 status=active
	11:0:3:5 sdbs 68:96 active ready running
`- 9:0:3:5 sdac 65:192 active ready running `+ policy='service-time 0' prio=10 status=enabled	9:0:2:5 sdi 8:128 active ready running `- 11:0:1:5 sdax 67:16 active ready running

Below are the logs for pnate-03:

http://people.redhat.com/mpatalan/.netapp/pnate-client-03_fc_fcnvme.4.18.0-526.el8.txt.gz

Ewan looked at the issue and provided the following info:

This looks like the problem:

Nov 29 22:56:34 pnate-client-03.sqe.lab.eng.bos.redhat.com sh[100597]:
Get discovery log page failed: -11
Nov 29 22:56:34 pnate-client-03.sqe.lab.eng.bos.redhat.com systemd[1]:
nvmf-connect@-~~device\x3dnone\ttransport\x3dfc\ttraddr\x3dnn-0x211600a098cbcac6:pn-0x210f00a098cbcac6\ttrsvcid\x3dnone\t~~-host-traddr\
\x3dnn-0x2000f4c7aa065db4:pn-0x2100f4c7aa065db4.service: Main process
exited, code=exited, status=11/n/a
Nov 29 22:56:34 pnate-client-03.sqe.lab.eng.bos.redhat.com systemd[1]:
nvmf-connect@-~~device\x3dnone\ttransport\x3dfc\ttraddr\x3dnn-0x211600a098cbcac6:pn-0x210f00a098cbcac6\ttrsvcid\x3dnone\t~~-host-traddr\
\x3dnn-0x2000f4c7aa065db4:pn-0x2100f4c7aa065db4.service: Failed with
result 'exit-code'.

In the other places in the log, we see the instantiation of 2
discovery controllers,
followed by the instantiation of the nvme controllers to access the subsystem.

Around this time, though, we do not see the second controller instantiated.
It looks like the nvme-cli command failed, or something. The exited, status=11
seems to be the -11 from the get discovery log page error earlier. 11 is EAGAIN
which is not generated by the NVMe/FC code. However...

commit 3e8721c6f1216aeb6fcd64cd61a86a8176308d3d
Author: Nilesh Javali <njavali@redhat.com>
Date: Mon Sep 18 10:51:12 2023 +0000

scsi: qla2xxx: Fix error code in qla2x00_start_sp()

JIRA: https://issues.redhat.com/browse/RHEL-9859

Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git

commit e579b007eff3ff8d29d59d16214cd85fb9e573f7
Author: Dan Carpenter <dan.carpenter@linaro.org>
Date: Mon Jun 26 13:58:47 2023 +0300

scsi: qla2xxx: Fix error code in qla2x00_start_sp()

This should be negative -EAGAIN instead of positive. The callers treat
non-zero error codes the same so it doesn't really impact runtime beyond
some trivial differences to debug output.

Fixes: 80676d054e5a ("scsi: qla2xxx: Fix session cleanup hang")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/49866d28-4cfe-47b0-842b-78f110e61aab@moroto.mountain
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Signed-off-by: Nilesh Javali <njavali@redhat.com>

diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c
index 18409ada0fff..d8d27c1e182c 100644
— a/drivers/scsi/qla2xxx/qla_iocb.c
+++ b/drivers/scsi/qla2xxx/qla_iocb.c
@@ -3913,7 +3913,7 @@ qla2x00_start_sp(srb_t *sp)

pkt = __qla2x00_alloc_iocbs(sp->qpair, sp);
if (!pkt) {
- rval = EAGAIN;
+ rval = -EAGAIN;
ql_log(ql_log_warn, vha, 0x700c,
"qla2x00_alloc_iocbs failed.\n");

How reproducible: Once

Please provide the package NVR for which bug is seen:

RHEL-8.10.0-20231121.1
kernel-4.18.0-526.el8

Steps to reproduce

run pNATE for NVMe-FC/FC

Assignee:: Nilesh Javali

Reporter:: Marco Patalano

Developer:: Nilesh Javali

QA Contact:: storage-qe

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2023/11/30 8:48 PM

Updated:: 2024/10/01 2:22 PM

Resolved:: 2024/05/07 1:03 PM

Details

Description

While running pNATE for RHEL-8.10, the test failed during path validation.

How reproducible: Once

Please provide the package NVR for which bug is seen:

Steps to reproduce

Attachments

Easy Agile Planning Poker

Activity

People

Dates