Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-47063

Multipath Updates - Q4 2024: Upstream

    • Icon: Task Task
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • None
    • rhel-sst-logical-storage
    • ssg_filesystems_storage_and_HA
    • 5
    • False
    • Hide

      None

      Show
      None

      This task is primarily meant to capture the upstream work, which consists of development activities and patch review that happen outside the RHEL process.  This round, bold items indicate the conclusion of a specific, longer-running task which is delineated by a leading [#].  These could be considered for up-level reporting or Quarterly Planning summaries.

      Oct 1st, 2024

      • An Ubuntu user reported an issue with multipath backed bcache devices not assembling correctly. I can make it work fine in Fedora. It looks like it’s likely a bcache udev rules issue.
      • More discussions about device persistent naming.

      Oct 15th, 2024

      • [1] Posted v3, v4 & v5 of my multipath checker update patchset. I’m pretty sure I’ve dealt with all of Martin’s issues.
      • [2] Talked to Vivek and Kevin Wolf about multipath SG_IO handling. If they can get customers to configure multipath in failover mode (with one path per pathgroup, so no load-balancing) they can get this working by simply sending a read request to the device whenever an SG_IO call fails with a transport related error.

      Oct 29th, 2024

      • More work on multipath with openstack
        • Most of the issues they are encountering are due to issues with the CSI (Container Storage Interface) driver than Infinidat wrote. These CSI drivers are things that individual storage vendors need to write. We apparently don’t provide a reference implementation or a template that would only require minimal work for most arrays.  This seems like a mistake.
      • Wrote a multipath udev rules patch to avoid triggering blkid  every time multipathd updates its paths. Waiting for feedback from Martin and Peter.
      • [2] More discussions with Kevin Wolf about getting SG_IOs working better on multipath devices. He's really pushing for an interface for dm devices to handle their own control ioctls, and for the multipath target to be able to do its own path checking in-kernel. Still not sure the userspace solution is as hard as he thinks it is.
      • [3] Multipath doesn’t handle dm devices with no tables gracefully. Fixed it to ignore ones whose DM UUID doesn't start with “mpath-”. Working on making multipathd automatically delete devices if they have no table and their uuid does start with “mpath-”. These can apparently appear if something kills multipathd at the wrong time.
      • Added automatic restarts to the multipathd.service systemd unit file.

      Nov 12th, 2024

      • [1] One last upstream requested change to my path checker patchset. It’s finally been accepted.
      • Talked to the Openshift people about making a simple CSI driver for multipath devices using Linux-IO/targetcli. Their response was that it probably wasn’t worth the work, since most vendors have already written CSI drivers.  They think it would be more valuable to just document best practices for dealing with multipath devices and paths, so we can give that to vendors to audit their CSIs. I’ve pointed them at some documentation.
      • [2] Wrote up an explanation of why we want multipath SG_IO failover handling done in userspace, and how it could be done.

      Nov 26th, 2024

      • [3] Worked through two iterations of my patches fixing some multipath table handling code with Martin. It's expanded to a 12 patch set. Most of them are ACKed. I'll be sending what will hopefully be the last version of this once I hash out some details on one of the patches with Martin.
      • Fixed a regression with multipath’s handling of kpartx partitions.
      • Reviewed changes for 0.11.0 multipath upstream release. Will pull that into fedora when it's out. My fix from above came in very late in the cycle, but fixed a serious bug introduced in 0.10.0. Martin and I have talked about creating a stable branch in the upstream multipath repo, to make it easier for distributions to pull in just the release plus important fixes.
      • Upstream multipath reviews. Mostly O.k.
      • Martin updated some code dealing with multipath's handling of the systemd Watchdog timer. In reviewing that I noticed that the multipath's handling of this timer didn't make any sense. Fixed this up after a couple of iterations.
      • Martin's fix for a tricky multipathd crash missed one of the ways it could occur. I've proposed a different, and much more fool-proof, approach to fixing this.
      • Multipath bug reported on GitHub. It was user-error, but multipath should handle their issue (they manually added a device to the WWIDS file instead of having multipath do it when it created the device, leading multipath to not claim the path devices in udev). Working on this now.

      Dec 10th, 2024

      • [3] All my table handling code is upstream
      • Wrote a fix for the multipath crash reported on GitHub. Noticed two other issues and fixed those as well. All merged.

              jbrassow@redhat.com Jonathan Brassow
              jbrassow@redhat.com Jonathan Brassow
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: