Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-61669

[OCP 4.18] coreos-boot-disk link not working with multipath on early boot

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • 4.18.z
    • RHCOS
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 3
    • None
    • None
    • None
    • CoreOS West - Sprint 277, CoreOS West - Sprint 278
    • 2
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Doing an assisted installer deployment (3 baremetal nodes) with multipath enabled since the beginning and wanting to segregate /var in a different partition inside the same root (multipath) disk, the installation fails with error:
      [   14.806929] slabnode2219.sl712cluster.slocp.netact.net ignition[3210]: disks: createPartitions: op(1): [started]  waiting for devices [/dev/disk/by-id/coreos-boot-disk]
      [  104.936043] slabnode2219.sl712cluster.slocp.netact.net systemd[1]: dev-disk-by\x2did-coreos\x2dboot\x2ddisk.device: Job dev-disk-by\x2did-coreos\x2dboot\x2ddisk.device/start timed out.
      [  104.936386] slabnode2219.sl712cluster.slocp.netact.net systemd[1]: Timed out waiting for device /dev/disk/by-id/coreos-boot-disk.
      [  104.984198] slabnode2219.sl712cluster.slocp.netact.net systemd[1]: dev-disk-by\x2did-coreos\x2dboot\x2ddisk.device: Job dev-disk-by\x2did-coreos\x2dboot\x2ddisk.device/start failed with result 'timeout'.
      

      Version-Release number of selected component (if applicable):

      4.18.21

      How reproducible:

      Always    

      Steps to Reproduce:

          1. Prepare a 3 node baremetal cluster with multipath for deploying it using on-premise assisted-installer
          2. Add this manifest to separate the /var partition:
      ~~~
      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        labels:
          machineconfiguration.openshift.io/role: worker
        name: 96-workers-var
      spec:
        config:
          ignition:
            version: 3.2.0
          storage:
            disks:
            - device: /dev/disk/by-id/coreos-boot-disk
              wipe_table: false
              partitions:
              - label: root
                number: 4
                resize: true
                sizeMiB: 10240
              - label: var
                number: 5
                sizeMiB: 0
            filesystems:
              - device: /dev/disk/by-partlabel/var
                path: /var
                format: xfs
                label: var
          systemd:
            units:
              - name: var.mount 
                enabled: true
                contents: |
                  [Unit]
                  Before=local-fs.target
                  [Mount]
                  What=/dev/disk/by-partlabel/var
                  Where=/var
                  Options=defaults,prjquota 
                  [Install]
                  WantedBy=local-fs.target
      ~~~
          3. load the iso and start deployment, after the initial reboot of the 2 (non-bootstrap) masters, they will reach the emergency shell
          

      Actual results:

      Multipath is correctly initialized during the early boot, also the "/dev/disk/by-id/coreos-boot-disk" points to one of the components of the root's multipath, but for some reason systemd is waiting for the device to appear which seems like an impossible condition. 
      Cluster can be installed without /var segregation, but we cannot include the manifest or the installation wont succeed.

      Expected results:

      Even though "/dev/disk/by-id/coreos-boot-disk" points to one of the components instead of the "mpath" device, the partitioning is made and the boot progresses.

      Additional info:

      We have tested the manifest in labs with only local disk and it works flawlessly. Also we have tested all the whole installation without the /var segregation and the installation completes successfully. But the combination of /var segregation and multipath is consistently failing.    

       

              tbueno@redhat.com Tiago Bueno
              rhn-support-mabajodu Mario Abajo Duran
              None
              None
              Michael Nguyen Michael Nguyen
              None
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: