Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-8053

iSCSI initiator cannot recover after target restart with large number of sessions

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • None
    • None
    • rhel-storage-io-1
    • ssg_filesystems_storage_and_HA
    • 5
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • If docs needed, set a value
    • None
    • 57,005

      Description of problem:
      When an iSCSI initiator has a large number of sessions (1024 in the example below) open with a target server, rebooting the target or setting its firewall to reject traffic for a short period of time (15-30s) leaves some or all of the iSCSI sessions in a broken state. They cannot be logged out or logged in again using 'iscsiadm' - “Logging out of session …” messages are printed, but session/block device state is not affected. The only way to clear the state is to reboot the initiator. Also, ‘iscsiadm’ hangs when trying to print more session information.

      Version-Release number of selected component (if applicable):
      iscsi-initiator-utils.x86_64 6.2.1.4-4.git095f59c
      kernel.x86_64 4.18.0-372.26.1

      How reproducible:
      Easily reproducible on every attempt.

      Steps to Reproduce:

      Setup the target with the following script (it requires 1024*512M space on /mnt, but disk size can be reduced)

      #!/bin/bash
      [[ -z $TARGETS ]] && TARGETS=1024
      [[ -z $BASEDIR ]] && BASEDIR="/mnt"
      [[ -z $BASENAME ]] && BASENAME="iqn.2022-10.com.example"

      yum install -y targetcli
      firewall-cmd --permanent --add-port=3260/tcp
      firewall-cmd --reload

      cmds=""
      for tgt in $(seq "$TARGETS"); do
      disk="disk${tgt}"
      target="${BASENAME}:tgt${tgt}"
      cmds="${cmds}cd /backstores/fileio\n"
      cmds="${cmds}create disk${tgt} ${BASEDIR}/${disk} 512M\n"
      cmds="${cmds}cd /iscsi\n"
      cmds="${cmds}create ${target}\n"
      cmds="${cmds}cd /iscsi/${target}/tpg1/luns\n"
      cmds="${cmds}create /backstores/fileio/${disk}\n"
      cmds="${cmds}cd /iscsi/${target}/tpg1/acls\n"
      cmds="${cmds}create iqn.2022-10.com.example:s26\n"
      done
      echo -e "$cmds" | targetcli
      systemctl restart target

      Create sessions on the initiator:
      iscsiadm -m discoverydb --type sendtargets --portal 10.1.7.25 --discover # replace 10.1.7.25 with target IP
      iscsiadm -m node --login all

      One way to put the sessions in a broken state is to simply reboot the target server.
      Another is to reject iSCSI packets for a short interval, e.g. by running ‘iptables -A INPUT -p tcp --dport 3260 -j REJECT; sleep 30; iptables -D INPUT -p tcp --dport 3260 -j REJECT’.

      Actual results:
      The vast majority of iSCSI block devices on the initiator go from “running” into a “blocked” state (as per ‘/sys/block/sd*/device/state’), and after a while reach “transport-offline”.

      Trying to use the “iscsiadm -m session -P3” command hangs with the following output:
      [root@s26 ~]# iscsiadm -m session -P3
      iSCSI Transport Class version 2.0-870
      version 6.2.1.4-1
      Target: iqn.2022-10.com.example:tgt1 (non-flash)
      Current Portal: 10.1.7.25:3260,1
      Persistent Portal: 10.1.7.25:3260,1
      **********
      Interface:
      **********
      Iface Name: default
      Iface Transport: tcp
      Iface Initiatorname: iqn.2022-10.com.example:s26
      Iface IPaddress: 10.1.7.26
      Iface HWaddress: default
      Iface Netdev: default
      SID: 1

      When running the above command with strace, it seems to get stuck polling for a response:
      socket(AF_UNIX, SOCK_STREAM, 0) = 3
      connect(3,

      {sa_family=AF_UNIX, sun_path=@"ISCSIADM_ABSTRACT_NAMESPACE"}

      , 30) = 0
      write(3, "\r\0\0\0\0\0\0\0\1\0\0\0\0[...]”, 16104) = 16104
      poll([

      {fd=3, events=POLLIN}

      ], 1, 1000) = 0 (Timeout)

      Increasing the ‘node.session.timeo.replacement_timeout’ parameter in /etc/iscsi/iscsid.conf might allow for some devices to return back to a ‘running’ state (and they can be used as normal), but still leaves the system in an overall broken state.

      Expected results:
      The iSCSI sessions should either recover, or at least be able to be manually reconnected by doing a logout & login.

              cleech@redhat.com Chris Leech
              redhat@storpool.com StorPool Storage (Inactive)
              Chris Leech Chris Leech
              Zhaojuan Guo Zhaojuan Guo
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: