Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-55559

Restarting the virt-handler pods breaks the connections to qemu-pr-helper, and reservations fail to work

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • CNV v4.16.6
    • Storage Ecosystem
    • None
    • Quality / Stability / Reliability
    • 0.42
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • Critical
    • None

      Description of problem:

      qemu-pr-helper runs in virt-handler pods. The qemu-kvm connects this  qemu-pr-helper to do the reservations. When the virt-handler is restarted, a new qemu-pr-helper process is generated with a new socket file. This is breaking the communication between the qemu-kvm and qemu-pr leading to the failure in SCSI reservations from the guests. 

      When I straced the qemu-kvm process, I can see "EACCES"  while it tries to connect the pr-helper:

      578027 06:52:35.418117 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0) = 13<UNIX-STREAM:[675790106]> <0.000060>
      578027 06:52:35.418304 connect(13<UNIX-STREAM:[675790106]>, {sa_family=AF_UNIX, sun_path="/var/run/kubevirt/daemons/pr/pr-helper.sock"}, 110) = -1 EACCES (Permission denied) <0.000093>
      578027 06:52:35.418589 close(13<UNIX-STREAM:[675790106]>) = 0 <0.000037>

      Socket file got following permissions:

      openshift-worker-cygnus-0 ~]# ls -lZ /var/run/kubevirt/daemons/pr/pr-helper.sock
      srwxr-xr-x. 1 root root system_u:object_r:container_var_run_t:s0 0 Jan 28 08:07 /var/run/kubevirt/daemons/pr/pr-helper.sock

      During the restart of virt-handler pod, it is not correcting the permissions of the helper.sock which happens during the "allocate" phase of device plugin during the VM startup.

      The reservations works if I manually correct the permissions:

      [root@openshift-worker-cygnus-0 ~]# chown 107.107 /var/run/kubevirt/daemons/pr/pr-helper.sock
      [root@openshift-worker-cygnus-0 ~]# chcon -t container_file_t /var/run/kubevirt/daemons/pr/pr-helper.sock

       

      Version-Release number of selected component (if applicable):

      OpenShift Virtualization         4.16.5

      How reproducible:

      100%

      Steps to Reproduce:

      1. Pass a disk to the VM with "reservation: true":

                - lun:
                    bus: scsi
                    reservation: true

      2.  Start the VM and once it's running, restart the virt-handler pod. Try running SCSI reservation from the VM and the  reservation fails:

      [root@rhe18-tomato-pigeon-93]# sg_persist -out —register —param-sark=0xABCDEFGH /deu/sda
      QEMU  QEMU HARDDISK  2.5+
      Peripheral device type: disk
      PR out (Register): Aborted command
      sg_persist failed: Aborted command

      Actual results:

      Restarting the virt-handler pods breaks the connections to qemu-pr-helper, and reservations fail to work

      Expected results:

      Since virt-handler may get restarted automatically (like during an upgrade, configuration changes), the VM shouldn't lose the reservation capability during the restart since it breaks the cluster applications running in VM.   

      Additional info:

       

              afrosirh Alice Frosi
              rhn-support-nashok Nijin Ashok
              Natalie Gavrielov Natalie Gavrielov
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: