Uploaded image for project: 'CoreOS OCP'
  1. CoreOS OCP
  2. COS-3217

[openshift/os] Multiple mpath devices with boot label causes wrong boot device to get mounted on /boot

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • Upstream
    • False
    • Hide

      None

      Show
      None
    • False

      [2970174783] Upstream Reporter: Kenneth D'souza
      Upstream issue status: Open
      Upstream description:

      *Issue*: Multiple mpath devices with boot label causes wrong boot device to get mounted on /boot

      *Analysis*
      Multiple mpath devices with boot label causes wrong boot device to get mounted on /boot.
      Due to which machine-config-daemon fails with below error as it cannot find the right content for ostree.

      ```
      Failed to initialize single run daemon: error reading osImageURL from rpm-ostree: exit status 1
      ```

      This occurs on nodes which are used for OCP virtualziuatrion where their VM's are directly using the additional mpath disks as storage.
      They get detcted early in the boot and due to below logic, wrong mpath boot get's mounted instead of the real boot device of the RHCOS node.

      Below is the code which mounts boot by label if it finds that root is on mpath.

      ```

      1. If the root device is multipath, hook up /boot to use that too,
      2. based on our custom udev rules in 90-coreos-device-mapper.rules
      3. that creates "label found on mpath" links.
      4. Otherwise, use the usual by-label symlink.
      5. See discussion in https://github.com/coreos/fedora-coreos-config/pull/1022
        bootdev=/dev/disk/by-label/boot
        bootkarg=$(karg boot)
        mpath=$(karg rd.multipath)
        if [ -n "${mpath}" ] && [ "${mpath}" != 0 ]; then
        bootdev=/dev/disk/by-label/dm-mpath-boot
      6. Newer nodes inject boot=UUID=..., but we support a larger subset of the dracut/fips API
        elif [ -n "${bootkarg}" ]; then
      7. Adapted from https://github.com/dracutdevs/dracut/blob/9491e599282d0d6bb12063eddbd192c0d2ce8acf/modules.d/01fips/fips.sh#L17
        case "$bootkarg" in
        LABEL=* | UUID=* | PARTUUID=* | PARTLABEL=*)
        bootdev="$(label_uuid_to_dev "$bootkarg")";;
        /dev/*) bootdev=$bootkarg;;
        *) echo "Unknown boot karg '${bootkarg}'; falling back to ${bootdev}";;
        esac
      8. This is used for the first boot only
        elif [ -f /run/coreos/bootfs_uuid ]; then
        bootdev=/dev/disk/by-uuid/$(cat /run/coreos/bootfs_uuid)
        fi

      ```
      *Current workaround*: ( Not fully tested)

      ```
      Make sure the real boot device of the RHCOS node is mounted

      $ rpm-ostree kargs --delete root --append root='UUID=<append the UUID of root'

      $ systemctl reboot
      ```

              Unassigned Unassigned
              upstream-sync Upstream Sync
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: