-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
CNV v4.16.6
-
None
-
Quality / Stability / Reliability
-
0.42
-
False
-
-
False
-
None
-
-
Critical
-
None
Description of problem:
qemu-pr-helper runs in virt-handler pods. The qemu-kvm connects this qemu-pr-helper to do the reservations. When the virt-handler is restarted, a new qemu-pr-helper process is generated with a new socket file. This is breaking the communication between the qemu-kvm and qemu-pr leading to the failure in SCSI reservations from the guests.
When I straced the qemu-kvm process, I can see "EACCES" while it tries to connect the pr-helper:
578027 06:52:35.418117 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0) = 13<UNIX-STREAM:[675790106]> <0.000060>
578027 06:52:35.418304 connect(13<UNIX-STREAM:[675790106]>, {sa_family=AF_UNIX, sun_path="/var/run/kubevirt/daemons/pr/pr-helper.sock"}, 110) = -1 EACCES (Permission denied) <0.000093>
578027 06:52:35.418589 close(13<UNIX-STREAM:[675790106]>) = 0 <0.000037>
Socket file got following permissions:
openshift-worker-cygnus-0 ~]# ls -lZ /var/run/kubevirt/daemons/pr/pr-helper.sock srwxr-xr-x. 1 root root system_u:object_r:container_var_run_t:s0 0 Jan 28 08:07 /var/run/kubevirt/daemons/pr/pr-helper.sock
During the restart of virt-handler pod, it is not correcting the permissions of the helper.sock which happens during the "allocate" phase of device plugin during the VM startup.
The reservations works if I manually correct the permissions:
[root@openshift-worker-cygnus-0 ~]# chown 107.107 /var/run/kubevirt/daemons/pr/pr-helper.sock [root@openshift-worker-cygnus-0 ~]# chcon -t container_file_t /var/run/kubevirt/daemons/pr/pr-helper.sock
Version-Release number of selected component (if applicable):
OpenShift Virtualization 4.16.5
How reproducible:
100%
Steps to Reproduce:
1. Pass a disk to the VM with "reservation: true":
- lun:
bus: scsi
reservation: true
2. Start the VM and once it's running, restart the virt-handler pod. Try running SCSI reservation from the VM and the reservation fails:
[root@rhe18-tomato-pigeon-93]# sg_persist -out —register —param-sark=0xABCDEFGH /deu/sda QEMU QEMU HARDDISK 2.5+ Peripheral device type: disk PR out (Register): Aborted command sg_persist failed: Aborted command
Actual results:
Restarting the virt-handler pods breaks the connections to qemu-pr-helper, and reservations fail to work
Expected results:
Since virt-handler may get restarted automatically (like during an upgrade, configuration changes), the VM shouldn't lose the reservation capability during the restart since it breaks the cluster applications running in VM.
Additional info: