Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-28990

[2209321] Automatic workload update failed after upgrade from 4.11.4->4.12.3

XMLWordPrintable

    • CNV Virtualization Sprint 238, CNV Virtualization Sprint 239, CNV Virtualization Sprint 240
    • Important
    • No

      Description of problem: Automatic workload update is failing after cnv upgrade from 4.11.4->4.12.3

      Version-Release number of selected component (if applicable):
      4.11.4->4.12.3

      How reproducible:
      Saw 1/1 try

      Steps to Reproduce:
      1. Upgrade cnv from 4.11.4->4.12.3 with workload that are live migratable
      2.
      3.

      Actual results:
      ====================
      [cloud-user@ocp-ipi-executor-xl ~]$ oc get vmim -A
      NAMESPACE NAME PHASE VMI
      kmp-enabled-for-upgrade kubevirt-workload-update-brrcg Pending vm-upgrade-a-1684798012-582829
      kmp-enabled-for-upgrade kubevirt-workload-update-bsz4w Failed vm-upgrade-a-1684798012-582829
      kmp-enabled-for-upgrade kubevirt-workload-update-fxxjp Failed vm-upgrade-a-1684798012-582829
      kmp-enabled-for-upgrade kubevirt-workload-update-h55sx Scheduling vm-upgrade-b-1684798062-9389849
      kmp-enabled-for-upgrade kubevirt-workload-update-nhhjw Failed vm-upgrade-b-1684798062-9389849
      kmp-enabled-for-upgrade kubevirt-workload-update-pcw9r Failed vm-upgrade-b-1684798062-9389849
      kmp-enabled-for-upgrade kubevirt-workload-update-qtb4p Failed vm-upgrade-b-1684798062-9389849
      kmp-enabled-for-upgrade kubevirt-workload-update-r6g4c Failed vm-upgrade-a-1684798012-582829
      kmp-enabled-for-upgrade kubevirt-workload-update-vcxp5 Failed vm-upgrade-b-1684798062-9389849
      kmp-enabled-for-upgrade kubevirt-workload-update-wdl65 Failed vm-upgrade-a-1684798012-582829
      kmp-enabled-for-upgrade kubevirt-workload-update-x9m58 Failed vm-upgrade-a-1684798012-582829
      kmp-enabled-for-upgrade kubevirt-workload-update-xd2dn Failed vm-upgrade-b-1684798062-9389849
      test-upgrade-namespace kubevirt-evacuation-brgqc Succeeded vm-for-product-upgrade-ocs-1684797074-763264
      test-upgrade-namespace kubevirt-evacuation-scpl2 Succeeded manual-run-strategy-vm-1684797367-9024162
      test-upgrade-namespace kubevirt-workload-update-2srsz Failed vm-for-product-upgrade-ocs-1684797074-763264
      test-upgrade-namespace kubevirt-workload-update-42xtr Failed windows-vm-1684797613-5818446
      test-upgrade-namespace kubevirt-workload-update-4dw4z Failed vmb-macspoof-1684797942-502964
      test-upgrade-namespace kubevirt-workload-update-59mdh Failed windows-vm-1684797613-5818446
      test-upgrade-namespace kubevirt-workload-update-625h2 Scheduling vmb-macspoof-1684797942-502964
      test-upgrade-namespace kubevirt-workload-update-6jnrm Failed vm-for-product-upgrade-nfs-1684797075-074361
      test-upgrade-namespace kubevirt-workload-update-7c82l Failed vm-for-product-upgrade-nfs-1684797075-074361
      test-upgrade-namespace kubevirt-workload-update-89vfh Failed vma-macspoof-1684797941-7677495
      test-upgrade-namespace kubevirt-workload-update-8g2vb Scheduling windows-vm-1684797613-5818446
      test-upgrade-namespace kubevirt-workload-update-99jfd Failed always-run-strategy-vm-1684797368-4356692
      test-upgrade-namespace kubevirt-workload-update-9m4vq Failed vm-for-product-upgrade-ocs-1684797074-763264
      test-upgrade-namespace kubevirt-workload-update-bmp9g Failed vmb-macspoof-1684797942-502964
      test-upgrade-namespace kubevirt-workload-update-c4vlr Failed always-run-strategy-vm-1684797368-4356692
      test-upgrade-namespace kubevirt-workload-update-fsplx Failed always-run-strategy-vm-1684797368-4356692
      test-upgrade-namespace kubevirt-workload-update-fzg5f Failed vm-for-product-upgrade-ocs-1684797074-763264
      test-upgrade-namespace kubevirt-workload-update-jdg62 Failed windows-vm-1684797613-5818446
      test-upgrade-namespace kubevirt-workload-update-jqhsp Pending vm-for-product-upgrade-nfs-1684797075-074361
      test-upgrade-namespace kubevirt-workload-update-jxtf4 Failed vma-macspoof-1684797941-7677495
      test-upgrade-namespace kubevirt-workload-update-jzj7p Failed vm-for-product-upgrade-nfs-1684797075-074361
      test-upgrade-namespace kubevirt-workload-update-k2kkq Failed manual-run-strategy-vm-1684797367-9024162
      test-upgrade-namespace kubevirt-workload-update-k7gvq Failed always-run-strategy-vm-1684797368-4356692
      test-upgrade-namespace kubevirt-workload-update-kxcsq Failed vmb-macspoof-1684797942-502964
      test-upgrade-namespace kubevirt-workload-update-ncgqc Failed always-run-strategy-vm-1684797368-4356692
      test-upgrade-namespace kubevirt-workload-update-njc4g Failed vma-macspoof-1684797941-7677495
      test-upgrade-namespace kubevirt-workload-update-nsnhn Failed windows-vm-1684797613-5818446
      test-upgrade-namespace kubevirt-workload-update-p6vhv Failed manual-run-strategy-vm-1684797367-9024162
      test-upgrade-namespace kubevirt-workload-update-pfc22 Failed vma-macspoof-1684797941-7677495
      test-upgrade-namespace kubevirt-workload-update-qscks Failed manual-run-strategy-vm-1684797367-9024162
      test-upgrade-namespace kubevirt-workload-update-qxpm6 Failed vm-for-product-upgrade-ocs-1684797074-763264
      test-upgrade-namespace kubevirt-workload-update-rbg45 Failed windows-vm-1684797613-5818446
      test-upgrade-namespace kubevirt-workload-update-rqc55 Failed vma-macspoof-1684797941-7677495
      test-upgrade-namespace kubevirt-workload-update-s7rbb Failed vm-for-product-upgrade-nfs-1684797075-074361
      test-upgrade-namespace kubevirt-workload-update-vgnx6 Succeeded vm-snapshot-upgrade-a-1684797866-069799
      test-upgrade-namespace kubevirt-workload-update-wbgds Failed vm-for-product-upgrade-nfs-1684797075-074361
      test-upgrade-namespace kubevirt-workload-update-zpptm Succeeded fedora-hotplug-upg-1684797719-130676
      [cloud-user@ocp-ipi-executor-xl ~]$
      =======================
      [cloud-user@ocp-ipi-executor-xl ~]$ oc get vmi -A
      NAMESPACE NAME AGE PHASE IP NODENAME READY
      kmp-enabled-for-upgrade vm-upgrade-a-1684798012-582829 90m Running 10.131.0.82 c01-dbn-412-nh5nx-worker-0-mdt7r True
      kmp-enabled-for-upgrade vm-upgrade-b-1684798062-9389849 90m Running 10.131.0.81 c01-dbn-412-nh5nx-worker-0-mdt7r True
      test-upgrade-namespace always-run-strategy-vm-1684797368-4356692 103m Running 10.128.2.100 c01-dbn-412-nh5nx-worker-0-pgrrn True
      test-upgrade-namespace fedora-hotplug-upg-1684797719-130676 88m Running 10.129.2.93 c01-dbn-412-nh5nx-worker-0-t5tzl True
      test-upgrade-namespace manual-run-strategy-vm-1684797367-9024162 103m Running 10.128.2.101 c01-dbn-412-nh5nx-worker-0-pgrrn True
      test-upgrade-namespace vm-for-product-upgrade-hos-1684797076-2242768 88m Running 10.128.2.79 c01-dbn-412-nh5nx-worker-0-pgrrn True
      test-upgrade-namespace vm-for-product-upgrade-nfs-1684797075-074361 106m Running 10.131.0.125 c01-dbn-412-nh5nx-worker-0-mdt7r True
      test-upgrade-namespace vm-for-product-upgrade-ocs-1684797074-763264 108m Running 10.131.0.127 c01-dbn-412-nh5nx-worker-0-mdt7r True
      test-upgrade-namespace vm-snapshot-upgrade-a-1684797866-069799 88m Running 10.129.2.95 c01-dbn-412-nh5nx-worker-0-t5tzl True
      test-upgrade-namespace vma-macspoof-1684797941-7677495 82m Running 10.131.0.135 c01-dbn-412-nh5nx-worker-0-mdt7r True
      test-upgrade-namespace vmb-macspoof-1684797942-502964 84m Running 10.131.0.132 c01-dbn-412-nh5nx-worker-0-mdt7r True
      test-upgrade-namespace windows-vm-1684797613-5818446 99m Running 10.128.2.99 c01-dbn-412-nh5nx-worker-0-pgrrn True
      [cloud-user@ocp-ipi-executor-xl ~]$ oc get pdb -A
      NAMESPACE NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
      openshift-apiserver openshift-apiserver-pdb N/A 1 1 5h7m
      openshift-cloud-controller-manager openstack-cloud-controller-manager 1 N/A 1 3h26m
      openshift-cluster-csi-drivers openstack-cinder-csi-driver-controller-pdb N/A 1 1 5h7m
      openshift-cluster-storage-operator csi-snapshot-controller-pdb N/A 1 1 5h7m
      openshift-cluster-storage-operator csi-snapshot-webhook-pdb N/A 1 1 5h7m
      openshift-cnv virt-api-pdb 1 N/A 1 4h7m
      openshift-cnv virt-controller-pdb 1 N/A 1 4h6m
      openshift-cnv virt-exportproxy-pdb 1 N/A 1 77m
      openshift-console console N/A 1 1 4h57m
      openshift-console downloads N/A 1 1 4h57m
      openshift-etcd etcd-guard-pdb 2 N/A 1 3h40m
      openshift-image-registry image-registry 0 N/A 1 4h59m
      openshift-ingress router-default N/A 50% 1 5h7m
      openshift-kube-apiserver kube-apiserver-guard-pdb 2 N/A 1 5h7m
      openshift-kube-controller-manager kube-controller-manager-guard-pdb 2 N/A 1 5h7m
      openshift-kube-scheduler openshift-kube-scheduler-guard-pdb 2 N/A 1 5h7m
      openshift-monitoring alertmanager-main N/A 1 1 4h56m
      openshift-monitoring prometheus-adapter 1 N/A 1 4h55m
      openshift-monitoring prometheus-k8s 1 N/A 1 4h56m
      openshift-monitoring prometheus-operator-admission-webhook 1 N/A 1 5h6m
      openshift-monitoring thanos-querier-pdb 1 N/A 1 4h56m
      openshift-nmstate nmstate-webhook 1 N/A 1 4h41m
      openshift-oauth-apiserver oauth-apiserver-pdb N/A 1 1 5h7m
      openshift-operator-lifecycle-manager packageserver-pdb N/A 1 1 5h9m
      openshift-storage rook-ceph-mds-ocs-storagecluster-cephfilesystem 1 N/A 1 4h16m
      openshift-storage rook-ceph-mon-pdb N/A 1 1 4h14m
      openshift-storage rook-ceph-osd N/A 1 1 84m
      test-upgrade-namespace kubevirt-disruption-budget-594hm 2 N/A 0 106m
      test-upgrade-namespace kubevirt-disruption-budget-6sb68 1 N/A 0 103m
      test-upgrade-namespace kubevirt-disruption-budget-82l5l 2 N/A 0 99m
      test-upgrade-namespace kubevirt-disruption-budget-pb46v 1 N/A 0 108m
      test-upgrade-namespace kubevirt-disruption-budget-shglc 1 N/A 0 104m
      [cloud-user@ocp-ipi-executor-xl ~]$
      ========================
      After 3 hours only 2 vms completed automatic live migration:
      ========================
      22:44:08 2023-05-23T02:44:06.824404 tests.compute.upgrade.utils ERROR Migratable vms: ['vm-for-product-upgrade-nfs-1684797075-074361', 'always-run-strategy-vm-1684797368-4356692', 'windows-vm-1684797613-5818446', 'vm-for-product-upgrade-ocs-1684797074-763264', 'manual-run-strategy-vm-1684797367-9024162'], vms with completed automatic workload update: ['vm-snapshot-upgrade-a-1684797866-069799', 'fedora-hotplug-upg-1684797719-130676'], and vms with failed automatic workload update: ['manual-run-strategy-vm-1684797367-9024162', 'vm-for-product-upgrade-nfs-1684797075-074361', 'windows-vm-1684797613-5818446', 'vm-for-product-upgrade-ocs-1684797074-763264', 'always-run-strategy-vm-1684797368-4356692']
      22:44:08
      =======================
      I see the following messages in virt controller log:
      =======================
      W0523 00:55:41.076823 1 warnings.go:70] would violate PodSecurity "restricted:v1.24": seLinuxOptions (pod set forbidden securityContext.seLinuxOptions: type "virt_launcher.process"), allowPrivilegeEscalation != false (container "compute" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "compute" must set securityContext.capabilities.drop=["ALL"]; container "compute" must not include "SYS_PTRACE" in securityContext.capabilities.add)

      {"component":"virt-controller","kind":"","level":"info","msg":"Created migration target pod test-upgrade-namespace/virt-launcher-windows-vm-1684797613-5818446-mgxpt with uuid a2611ad1-dccb-4e92-9d32-c994d0b0d1b7 for migration kubevirt-workload-update-8g2vb with uuid d19b62e4-99b0-4fd9-b379-e4f7e8a0bc21","name":"windows-vm-1684797613-5818446","namespace":"test-upgrade-namespace","pos":"migration.go:632","timestamp":"2023-05-23T00:55:41.077284Z","uid":"23a4099a-ab11-4f96-b571-ffd061612fab"} {"component":"virt-controller","kind":"","level":"warning","msg":"Migration target pod for VMI [test-upgrade-namespace/windows-vm-1684797613-5818446] is currently unschedulable.","name":"kubevirt-workload-update-8g2vb","namespace":"test-upgrade-namespace","pos":"migration.go:1092","timestamp":"2023-05-23T00:55:41.091782Z","uid":"d19b62e4-99b0-4fd9-b379-e4f7e8a0bc21"} {"component":"virt-controller","kind":"","level":"warning","msg":"Migration target pod for VMI [test-upgrade-namespace/windows-vm-1684797613-5818446] is currently unschedulable.","name":"kubevirt-workload-update-8g2vb","namespace":"test-upgrade-namespace","pos":"migration.go:1092","timestamp":"2023-05-23T00:55:41.101497Z","uid":"d19b62e4-99b0-4fd9-b379-e4f7e8a0bc21"} {"component":"virt-controller","kind":"","level":"info","msg":"Waiting to schedule target pod for vmi [test-upgrade-namespace/vm-for-product-upgrade-nfs-1684797075-074361] migration because total running parallel outbound migrations on target node [2] has hit outbound migrations per node limit.","name":"kubevirt-workload-update-jqhsp","namespace":"test-upgrade-namespace","pos":"migration.go:900","timestamp":"2023-05-23T00:55:46.007608Z","uid":"1e074014-9213-4947-8839-8eabfd94de17"} {"component":"virt-controller","level":"info","msg":"TSC Freqency node update status: 0 updated, 0 skipped, 0 errors","pos":"nodetopologyupdater.go:44","timestamp":"2023-05-23T00:55:47.165328Z"} {"component":"virt-controller","kind":"","level":"info","msg":"Waiting to schedule target pod for vmi [test-upgrade-namespace/vm-for-product-upgrade-nfs-1684797075-074361] migration because total running parallel outbound migrations on target node [2] has hit outbound migrations per node limit.","name":"kubevirt-workload-update-jqhsp","namespace":"test-upgrade-namespace","pos":"migration.go:900","timestamp":"2023-05-23T00:55:51.007928Z","uid":"1e074014-9213-4947-8839-8eabfd94de17"} {"component":"virt-controller","level":"info","msg":"certificate with common name 'export.kubevirt.io@1684798758' retrieved.","pos":"cert-manager.go:198","timestamp":"2023-05-23T00:55:51.322621Z"}

      ====================
      [cloud-user@ocp-ipi-executor-xl ~]$ oc get clusterversion
      NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
      version 4.12.18 True False 15h Cluster version is 4.12.18
      [cloud-user@ocp-ipi-executor-xl ~]$ oc get csv -n openshift-cnv
      NAME DISPLAY VERSION REPLACES PHASE
      kubevirt-hyperconverged-operator.v4.12.3 OpenShift Virtualization 4.12.3 kubevirt-hyperconverged-operator.v4.11.4 Succeeded
      [cloud-user@ocp-ipi-executor-xl ~]$

      Expected results:
      Vms would complete automatic workload update, post cnv upgrade

      Additional info:
      Must-gather saved here: https://drive.google.com/drive/folders/1HagvkR4n4h4uYbnppqDFlH3UgYCzkIS1?usp=share_link

              lpivarc Luboslav Pivarc
              rhn-support-dbasunag Debarati Basu-Nag
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: