Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-20134

Improve documentation procedure: enabling multipath in bare metal on install time

XMLWordPrintable

    • Important
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Customer would like to add a second disk device into RHCOS nodes to mount '/var/lib/containers' by following steps in article https://access.redhat.com/solutions/4952011

      This other device is a multipath one, with LVM on top, and they are trying to use it as a separate filesystem to mount '/var/lib/containers'

      It fails with LVM timing out on job start

      Oct 04 20:55:03 masr8c3locp2w5.corp.du.ae systemd[1]: Found device /dev/mapper/container-vol.
      Oct 04 20:55:03 masr8c3locp2w5.corp.du.ae systemd[1]: Started LVM event activation on device 253:0.
      .....
      Oct 04 20:56:25 masr8c3locp2w5.corp.du.ae systemd[1]: dev-mapper-container-vol.device: Job dev-mapper-container-vol.device/start timed out.
      Oct 04 20:56:25 masr8c3locp2w5.corp.du.ae systemd[1]: Timed out waiting for device dev-mapper-container-vol.device.
      Oct 04 20:56:25 masr8c3locp2w5.corp.du.ae systemd[1]: Dependency failed for Make File System on /dev/mapper/container-vol.
      Oct 04 20:56:25 masr8c3locp2w5.corp.du.ae systemd[1]: Dependency failed for Mount /dev/mapper/container-vol to /var/lib/containers.
      Oct 04 20:56:25 masr8c3locp2w5.corp.du.ae systemd[1]: Dependency failed for CRI-O Auto Update Script.
      Oct 04 20:56:25 masr8c3locp2w5.corp.du.ae systemd[1]: crio-wipe.service: Job crio-wipe.service/start failed with result 'dependency'.
      Oct 04 20:56:25 masr8c3locp2w5.corp.du.ae systemd[1]: var-lib-containers.mount: Job var-lib-containers.mount/start failed with result 'dependency'.
      Oct 04 20:56:25 masr8c3locp2w5.corp.du.ae systemd[1]: systemd-mkfs@dev-mapper-container-vol.service: Job systemd-mkfs@dev-mapper-container-vol.service/start failed with result 'dependency'.
      Oct 04 20:56:25 masr8c3locp2w5.corp.du.ae systemd[1]: dev-mapper-container-vol.device: Job dev-mapper-container-vol.device/start failed with result 'timeout'. 

      This lvm is on top of mpath devices.

      sdc                 8:32   0     2T  0 disk  
      └─mpatha          253:0    0     2T  0 mpath 
        └─container-vol 253:1    0     2T  0 lvm   
      sdd                 8:48   0     2T  0 disk  
      └─mpatha          253:0    0     2T  0 mpath 
        └─container-vol 253:1    0     2T  0 lvm   
      sde                 8:64   0     2T  0 disk  
      └─mpatha          253:0    0     2T  0 mpath 
        └─container-vol 253:1    0     2T  0 lvm   
      sdf                 8:80   0     2T  0 disk  
      └─mpatha          253:0    0     2T  0 mpath 
        └─container-vol 253:1    0     2T  0 lvm   

      My understanding is that customer can use the steps in documentation to enable multipath during node install here: https://docs.openshift.com/container-platform/4.12/installing/installing_bare_metal/installing-bare-metal.html#rhcos-enabling-multipath_installing-bare-metal and specify the the WWN for the /dev section to avoid multi-naming issue with mpathX devices.

      Still, the link isn't completely clear to me and I have a few questions:

      • Where these commands (mpathconf & coreos-installer) must be executed? Is it inside the ISO installer, meaning that customer needs to boot the system with the ISO ? 
      • If it is inside the ISO installer, last steps in docs should point to reboot the system right? My understanding is for it to take the ignition config and join the cluster
      • From where does it takes the ignition config to join the cluster ? I don't see the --ignition-url mentioned in the coreos-installer command while specifying the multipath device where the OS will be installed.
      • How, if possible, one would define 2 different devices at installation time for RHCOS? main device will be local as it is now in customer's node and second device will be multipath device to mount on '/var/lib/container'

       

      Version-Release number of selected component (if applicable):

      • OCP 4.12

      How reproducible:

      • All the time on customer's environment

      Steps to Reproduce:

      1. Create a MachineConfig and declare a LVM device to be mounted in /var/lib/containers following steps from https://access.redhat.com/solutions/4952011
      2. Apply MC to a node 

       

      Actual results:

      • Node doesn't start properly with LVM timeout not mounting /var/lib/containers in a separate storage device

      Expected results:

      • MachineConfig to be applied on node and mount the separate storage device (multipath + LVM) into /var/lib/containers

      Additional info:

      • MachineConfig spec:

       

      spec:
        config:
          ignition:
            version: 3.2.0
          systemd:
            units:
            - contents: |
                [Unit]
                Description=Make File System on /dev/mapper/container-vol
                DefaultDependencies=no
                BindsTo=dev-mapper-container-vol.device
                After=dev-mapper-container-vol.device var.mount
                Before=systemd-fsck@dev-mapper-container-vol.service
                [Service]
                Type=oneshot
                RemainAfterExit=yes
                ExecStart=-/bin/bash -c "/bin/rm -rf /var/lib/containers/*"
                ExecStart=/usr/lib/systemd/systemd-makefs xfs /dev/mapper/container-vol
                TimeoutSec=0
                [Install]
                WantedBy=var-lib-containers.mount
              enabled: true
              name: systemd-mkfs@dev-mapper-container-vol.service
            - contents: |
                [Unit]
                Description=Mount /dev/mapper/container-vol to /var/lib/containers
                Before=local-fs.target
                Requires=systemd-mkfs@dev-mapper-container-vol.service
                After=systemd-mkfs@dev-mapper-container-vol.service
                [Mount]
                What=/dev/mapper/container-vol
                Where=/var/lib/containers
                Type=xfs
                Options=defaults,prjquota
                [Install]
                WantedBy=local-fs.target
              enabled: true
              name: var-lib-containers.mount
            - contents: |
                [Unit]
                Description=Restore recursive SELinux security contexts
                DefaultDependencies=no
                After=var-lib-containers.mount
                Before=crio.service
                [Service]
                Type=oneshot
                RemainAfterExit=yes
                ExecStart=/sbin/restorecon -R /var/lib/containers/
                TimeoutSec=0
                [Install]
                WantedBy=multi-user.target graphical.target
              enabled: true
              name: restorecon-var-lib-containers.service 

       

       

            ocp-docs-bot OCP DocsBot
            rhn-support-jcoscia Javier Coscia
            Michael Nguyen Michael Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: