Loading...

Type: Bug
Resolution: Done-Errata
Fix Version/s: CNV v4.14.0
Affects Version/s: None
Component/s: Storage Platform
Labels:

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
BZ Internal Whiteboard:
NoActiveCustomerTickets
BZ Status:
CLOSED
BZ URL:
https://bugzilla.redhat.com/show_bug.cgi?id=2156753
Bugzilla Bug:
RHBZ: 2156753
Intelligence Requested:
Market:

Sprint:
Storage Core Sprint 246
Severity:
Critical

Regression:
No

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

The LVM stores the LVM metadata on the disk and by default it works with all the devices under /dev/. So it scans every device in /dev/ looking for the LVM metadata and will show the LVM devices obtained from these metadata. When using Ceph rbd storage class, we have a loop device linked to the Persistent Volume.

~~~
/dev/loop2: [0006]:51168221 (/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-d74dc943-7166-4eb9-9d3e-d6e6d220e0f0/dev/233e0255-280d-4676-a868-8857ec4ad282)
~~~

Although the LVM was created from the VM, the loop device is directly linked to rbd device and hence the LVM in the node is able to find the LVM metadata created by the VM from these loop devices.

~~~
worker-0 ~]# pvs
PV VG Fmt Attr PSize PFree
/dev/loop2 vg1 lvm2 a-- <30.00g <29.00g
~~~

And the event activation is by default true in LVM which will auto-activate these discovered LVs.

~~~
Dec 28 11:34:50 worker-0.ocp4.shiftvirt.com lvm[27124]: pvscan[27124] PV /dev/loop2 online, VG vg1 is complete.
Dec 28 11:34:50 worker-0.ocp4.shiftvirt.com lvm[27124]: pvscan[27124] VG vg1 run autoactivation.
Dec 28 11:34:50 worker-0.ocp4.shiftvirt.com lvm[27124]: pvscan[27124] PVID blLzdH-Ww48-4HqC-qYF5-dkSf-ArRd-jyiGxc read from /dev/loop2 last written to /dev/vdc
Dec 28 11:34:50 worker-0.ocp4.shiftvirt.com lvm[27124]: 1 logical volume(s) in volume group "vg1" now active
~~~

This will create a dm device on top of the loop device.

~~~
dmsetup ls --tree
vg1-lv1 (253:0)
`- (7:2)
~~~

Now during the VM power down when it tries to unmap the Persistent Volume, it will fail because here we have dm device active here on the linked loop device preventing the umount.

~~~
Dec 28 11:49:06 worker-0.ocp4.shiftvirt.com hyperkube[2200]: E1228 11:49:06.274235 2200 nestedpendingoperations.go:335] Operation for "

{volumeName:kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com^0001-0011-openshift-storage-0000000000000002-82a7d8eb-869b-11ed-8332-0a580a81016d podName:ae3b79e4-e551-4338-a8b4-f2f18e8c1469 nodeName:}

" failed. No retries permitted until 2022-12-28 11:49:38.274206068 +0000 UTC m=+2048.314118469 (durationBeforeRetry 32s). Error: UnmapVolume.UnmapBlockVolume failed for volume "disk-integral-boar" (UniqueName: "kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com^0001-0011-openshift-storage-0000000000000002-82a7d8eb-869b-11ed-8332-0a580a81016d") pod "ae3b79e4-e551-4338-a8b4-f2f18e8c1469" (UID: "ae3b79e4-e551-4338-a8b4-f2f18e8c1469") : blkUtil.DetachFileDevice failed. globalUnmapPath:/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-d74dc943-7166-4eb9-9d3e-d6e6d220e0f0/dev, podUID: ae3b79e4-e551-4338-a8b4-f2f18e8c1469, bindMount: true: failed to unmount linkPath /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-d74dc943-7166-4eb9-9d3e-d6e6d220e0f0/dev/ae3b79e4-e551-4338-a8b4-f2f18e8c1469: unmount failed: exit status 32

Dec 28 11:49:06 worker-0.ocp4.shiftvirt.com hyperkube[2200]: Output: umount: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-d74dc943-7166-4eb9-9d3e-d6e6d220e0f0/dev/ae3b79e4-e551-4338-a8b4-f2f18e8c1469: target is busy.
~~~

This causes the virt-launcher pod to be stuck in "Terminating" status indefinitely.

Manual umount will also fail:

~~~

umount /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-d74dc943-7166-4eb9-9d3e-d6e6d220e0f0/dev/ae3b79e4-e551-4338-a8b4-f2f18e8c1469
umount: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-d74dc943-7166-4eb9-9d3e-d6e6d220e0f0/dev/ae3b79e4-e551-4338-a8b4-f2f18e8c1469: target is busy.
~~~

Removing the dm device will allow unmounting and will remove the virt-launcher pod.

~~~

dmsetup remove vg1-lv1
umount /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-d74dc943-7166-4eb9-9d3e-d6e6d220e0f0/dev/ae3b79e4-e551-4338-a8b4-f2f18e8c1469
~~~

Version-Release number of selected component (if applicable):

OpenShift Virtualization 4.11.1

How reproducible:

100%

Steps to Reproduce:

1. Create an LVM on the ceph rbd disk from the virtual machine.
2. On the node where the VM is running, run LVM commands. The LVM created from the VM will be visible in the node and the LV will be active.
3. Shutdown this VM. The virt-launcher pod will be stuck in terminating status.

Actual results:

LVM created from the VMs are getting activated in the OCP nodes

Expected results:

The LVM created from the VM shouldn't be visible on the OCP node.

Additional info:

is blocked by

OCPBUGS-5223 Pod stuck in terminating if there is an LVM LogicalVolume on the underlying block PVC