Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-6008

deleting large numbers of VMs causes pods to enter terminating state indefinitely

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Normal Normal
    • None
    • 4.12.0
    • Node / Kubelet
    • Critical
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      deleting large number of VMs causes thier pods to get stuck at terminating state indefeintly.

      Version-Release number of selected component (if applicable):

      OpenShift Virtualization   4.12.0-781
      Server Version: 4.12.0
      Kubernetes Version: v1.25.4+77bec7a

      How reproducible:

      100% success rate with 1K container disk VMs.

      Steps to Reproduce:

      1. Deploy large number of VMs => 500
      2. Delete all VM's using somthing like "oc delete vm --all"
      

      Actual results:

      VMs will get stuck at Terminating state.
      virt-launcher-fedora-vm0962-5lg84   0/2     Terminating   0          5h45m
      virt-launcher-fedora-vm0966-zcrhq   0/2     Terminating   0          5h44m
      virt-launcher-fedora-vm0968-hzlzs   0/2     Terminating   0          5h45m
      virt-launcher-fedora-vm0969-kqqtm   0/2     Terminating   0          5h45m
      virt-launcher-fedora-vm0970-j2gv9   0/2     Terminating   0          5h27m
      virt-launcher-fedora-vm0971-mbpck   0/2     Terminating   0          5h45m
      virt-launcher-fedora-vm0972-8zvbx   0/2     Terminating   0          5h45m
      virt-launcher-fedora-vm0973-q4qp5   0/2     Terminating   0          5h45m
      virt-launcher-fedora-vm0976-nhsg8   0/2     Terminating   0          5h45m
      virt-launcher-fedora-vm0978-58fsk   0/2     Terminating   0          22h
      virt-launcher-fedora-vm0979-bprq6   0/2     Terminating   0          21h
      virt-launcher-fedora-vm0983-9phcq   0/2     Terminating   0          22h
      virt-launcher-fedora-vm0984-4f6l6   0/2     Terminating   0          5h45m
      virt-launcher-fedora-vm0986-gflz2   0/2     Terminating   0          22h
      virt-launcher-fedora-vm0989-phwx4   0/2     Terminating   0          5h45m
      virt-launcher-fedora-vm0990-nxptj   0/2     Terminating   0          5h45m
      virt-launcher-fedora-vm0991-pnvqt   0/2     Terminating   0          5h45m
      virt-launcher-fedora-vm0993-sk7wn   0/2     Terminating   0          21h
      virt-launcher-fedora-vm0995-jgsh6   0/2     Terminating   0          5h36m
      virt-launcher-fedora-vm0997-dqrms   0/2     Terminating   0          5h45m

      Expected results:

      all VM deleted.

      Additional info:

      I have enbaled the following verbose logs:
        logVerbosityConfig:
          kubevirt:
            virtAPI: 9
            virtController: 9
            virtHandler: 9
            virtLauncher: 9

      some logs:

      label desc = connection error: desc = "transport: Error while dialing dial unix //pods/2cd63bba-5fa0-447c-a6b6-8b7c04a8a3a2/volumes/kubernetes.io~empty-dir/sockets/launcher-sock: connect: no such file or directory"
      13m         Warning   SyncFailed              virtualmachineinstance/fedora-vm0950   unknown error encountered sending command Ping: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix //pods/254bf261-2e1e-4f2d-8189-f5a11c0d01bb/volumes/kubernetes.io~empty-dir/sockets/launcher-sock: connect: no such file or directory"
      13m         Warning   SyncFailed              virtualmachineinstance/fedora-vm0955   unknown error encountered sending command Ping: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix //pods/0b88ef2e-401d-4e27-8815-536cca053ca4/volumes/kubernetes.io~empty-dir/sockets/launcher-sock: connect: no such file or directory"
      63s         Warning   SyncFailed              virtualmachineinstance/fedora-vm0957   unknown error encountered sending command Ping: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix //pods/2bcaeebd-1353-4c53-9fc8-a83f35eeb6d5/volumes/kubernetes.io~empty-dir/sockets/launcher-sock: connect: no such file or directory"
      63s         Warning   SyncFailed              virtualmachineinstance/fedora-vm0979   unknown error encountered sending command Ping: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix //pods/0f52145e-7880-4d40-ada4-ba81011b692f/volumes/kubernetes.io~empty-dir/sockets/launcher-sock: connect: no such file or directory"
      13m         Warning   SyncFailed              virtualmachineinstance/fedora-vm0983   unknown error encountered sending command Ping: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix //pods/2e090eea-2c9b-4b81-a380-d401171ea549/volumes/kubernetes.io~empty-dir/sockets/launcher-sock: connect: no such file or directory"
      63s         Warning   SyncFailed              virtualmachineinstance/fedora-vm0993   unknown error encountered sending command Ping: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix //pods/00dbde95-ab33-4dbb-a572-43e7cc002204/volumes/kubernetes.io~empty-dir/sockets/launcher-sock: connect: no such file or directory"

            rphillip@redhat.com Ryan Phillips
            bbenshab Boaz Ben Shabat
            Chen Yosef Chen Yosef
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: