Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-23703

[2156753] LVM created from the VMs are getting activated in the OCP nodes

XMLWordPrintable

    • Storage Core Sprint 246
    • Urgent
    • No

      Description of problem:

      The LVM stores the LVM metadata on the disk and by default it works with all the devices under /dev/. So it scans every device in /dev/ looking for the LVM metadata and will show the LVM devices obtained from these metadata. When using Ceph rbd storage class, we have a loop device linked to the Persistent Volume.

      ~~~
      /dev/loop2: [0006]:51168221 (/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-d74dc943-7166-4eb9-9d3e-d6e6d220e0f0/dev/233e0255-280d-4676-a868-8857ec4ad282)
      ~~~

      Although the LVM was created from the VM, the loop device is directly linked to rbd device and hence the LVM in the node is able to find the LVM metadata created by the VM from these loop devices.

      ~~~
      worker-0 ~]# pvs
      PV VG Fmt Attr PSize PFree
      /dev/loop2 vg1 lvm2 a-- <30.00g <29.00g
      ~~~

      And the event activation is by default true in LVM which will auto-activate these discovered LVs.

      ~~~
      Dec 28 11:34:50 worker-0.ocp4.shiftvirt.com lvm[27124]: pvscan[27124] PV /dev/loop2 online, VG vg1 is complete.
      Dec 28 11:34:50 worker-0.ocp4.shiftvirt.com lvm[27124]: pvscan[27124] VG vg1 run autoactivation.
      Dec 28 11:34:50 worker-0.ocp4.shiftvirt.com lvm[27124]: pvscan[27124] PVID blLzdH-Ww48-4HqC-qYF5-dkSf-ArRd-jyiGxc read from /dev/loop2 last written to /dev/vdc
      Dec 28 11:34:50 worker-0.ocp4.shiftvirt.com lvm[27124]: 1 logical volume(s) in volume group "vg1" now active
      ~~~

      This will create a dm device on top of the loop device.

      ~~~
      dmsetup ls --tree
      vg1-lv1 (253:0)
      `- (7:2)
      ~~~

      Now during the VM power down when it tries to unmap the Persistent Volume, it will fail because here we have dm device active here on the linked loop device preventing the umount.

      ~~~
      Dec 28 11:49:06 worker-0.ocp4.shiftvirt.com hyperkube[2200]: E1228 11:49:06.274235 2200 nestedpendingoperations.go:335] Operation for "

      {volumeName:kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com^0001-0011-openshift-storage-0000000000000002-82a7d8eb-869b-11ed-8332-0a580a81016d podName:ae3b79e4-e551-4338-a8b4-f2f18e8c1469 nodeName:}

      " failed. No retries permitted until 2022-12-28 11:49:38.274206068 +0000 UTC m=+2048.314118469 (durationBeforeRetry 32s). Error: UnmapVolume.UnmapBlockVolume failed for volume "disk-integral-boar" (UniqueName: "kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com^0001-0011-openshift-storage-0000000000000002-82a7d8eb-869b-11ed-8332-0a580a81016d") pod "ae3b79e4-e551-4338-a8b4-f2f18e8c1469" (UID: "ae3b79e4-e551-4338-a8b4-f2f18e8c1469") : blkUtil.DetachFileDevice failed. globalUnmapPath:/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-d74dc943-7166-4eb9-9d3e-d6e6d220e0f0/dev, podUID: ae3b79e4-e551-4338-a8b4-f2f18e8c1469, bindMount: true: failed to unmount linkPath /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-d74dc943-7166-4eb9-9d3e-d6e6d220e0f0/dev/ae3b79e4-e551-4338-a8b4-f2f18e8c1469: unmount failed: exit status 32

      Dec 28 11:49:06 worker-0.ocp4.shiftvirt.com hyperkube[2200]: Output: umount: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-d74dc943-7166-4eb9-9d3e-d6e6d220e0f0/dev/ae3b79e4-e551-4338-a8b4-f2f18e8c1469: target is busy.
      ~~~

      This causes the virt-launcher pod to be stuck in "Terminating" status indefinitely.

      Manual umount will also fail:

      ~~~

      1. umount /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-d74dc943-7166-4eb9-9d3e-d6e6d220e0f0/dev/ae3b79e4-e551-4338-a8b4-f2f18e8c1469
        umount: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-d74dc943-7166-4eb9-9d3e-d6e6d220e0f0/dev/ae3b79e4-e551-4338-a8b4-f2f18e8c1469: target is busy.
        ~~~

      Removing the dm device will allow unmounting and will remove the virt-launcher pod.

      ~~~

      1. dmsetup remove vg1-lv1
      2. umount /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-d74dc943-7166-4eb9-9d3e-d6e6d220e0f0/dev/ae3b79e4-e551-4338-a8b4-f2f18e8c1469
        ~~~

      Version-Release number of selected component (if applicable):

      OpenShift Virtualization 4.11.1

      How reproducible:

      100%

      Steps to Reproduce:

      1. Create an LVM on the ceph rbd disk from the virtual machine.
      2. On the node where the VM is running, run LVM commands. The LVM created from the VM will be visible in the node and the LV will be active.
      3. Shutdown this VM. The virt-launcher pod will be stuck in terminating status.

      Actual results:

      LVM created from the VMs are getting activated in the OCP nodes

      Expected results:

      The LVM created from the VM shouldn't be visible on the OCP node.

      Additional info:

              alitke@redhat.com Adam Litke
              rhn-support-nashok Nijin Ashok
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: