Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: CNV v4.16.6
Component/s: Storage Ecosystem
Labels:
None

Activity Type:
Quality / Stability / Reliability
Story Points:
0.42
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Component Fix Version(s):
None
Market:

Severity:
Critical

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:

Description of problem:

qemu-pr-helper runs in virt-handler pods. The qemu-kvm connects this qemu-pr-helper to do the reservations. When the virt-handler is restarted, a new qemu-pr-helper process is generated with a new socket file. This is breaking the communication between the qemu-kvm and qemu-pr leading to the failure in SCSI reservations from the guests.

When I straced the qemu-kvm process, I can see "EACCES" while it tries to connect the pr-helper:

578027 06:52:35.418117 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0) = 13<UNIX-STREAM:[675790106]> <0.000060>
578027 06:52:35.418304 connect(13<UNIX-STREAM:[675790106]>, {sa_family=AF_UNIX, sun_path="/var/run/kubevirt/daemons/pr/pr-helper.sock"}, 110) = -1 EACCES (Permission denied) <0.000093>
578027 06:52:35.418589 close(13<UNIX-STREAM:[675790106]>) = 0 <0.000037>

Socket file got following permissions:

openshift-worker-cygnus-0 ~]# ls -lZ /var/run/kubevirt/daemons/pr/pr-helper.sock
srwxr-xr-x. 1 root root system_u:object_r:container_var_run_t:s0 0 Jan 28 08:07 /var/run/kubevirt/daemons/pr/pr-helper.sock

During the restart of virt-handler pod, it is not correcting the permissions of the helper.sock which happens during the "allocate" phase of device plugin during the VM startup.

The reservations works if I manually correct the permissions:

[root@openshift-worker-cygnus-0 ~]# chown 107.107 /var/run/kubevirt/daemons/pr/pr-helper.sock
[root@openshift-worker-cygnus-0 ~]# chcon -t container_file_t /var/run/kubevirt/daemons/pr/pr-helper.sock

Version-Release number of selected component (if applicable):

OpenShift Virtualization         4.16.5

How reproducible:

100%

Steps to Reproduce:

1. Pass a disk to the VM with "reservation: true":

          - lun:
              bus: scsi
              reservation: true

2. Start the VM and once it's running, restart the virt-handler pod. Try running SCSI reservation from the VM and the reservation fails:

[root@rhe18-tomato-pigeon-93]# sg_persist -out —register —param-sark=0xABCDEFGH /deu/sda
QEMU  QEMU HARDDISK  2.5+
Peripheral device type: disk
PR out (Register): Aborted command
sg_persist failed: Aborted command

Actual results:

Restarting the virt-handler pods breaks the connections to qemu-pr-helper, and reservations fail to work

Expected results:

Since virt-handler may get restarted automatically (like during an upgrade, configuration changes), the VM shouldn't lose the reservation capability during the restart since it breaks the cluster applications running in VM.

Additional info:

Assignee:: Alice Frosi

Reporter:: Nijin Ashok

QA Contact:: Natalie Gavrielov

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2025/01/28 8:24 AM

Updated:: 2025/09/24 8:40 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates