-
Bug
-
Resolution: Not a Bug
-
Major
-
None
-
Quality / Stability / Reliability
-
8
-
False
-
-
False
-
CLOSED
-
-
-
CNV Virtualization Sprint 238, CNV Virtualization Sprint 239, CNV Virtualization Sprint 240
-
Important
-
No
Description of problem: Automatic workload update is failing after cnv upgrade from 4.11.4->4.12.3
Version-Release number of selected component (if applicable):
4.11.4->4.12.3
How reproducible:
Saw 1/1 try
Steps to Reproduce:
1. Upgrade cnv from 4.11.4->4.12.3 with workload that are live migratable
2.
3.
Actual results:
====================
[cloud-user@ocp-ipi-executor-xl ~]$ oc get vmim -A
NAMESPACE NAME PHASE VMI
kmp-enabled-for-upgrade kubevirt-workload-update-brrcg Pending vm-upgrade-a-1684798012-582829
kmp-enabled-for-upgrade kubevirt-workload-update-bsz4w Failed vm-upgrade-a-1684798012-582829
kmp-enabled-for-upgrade kubevirt-workload-update-fxxjp Failed vm-upgrade-a-1684798012-582829
kmp-enabled-for-upgrade kubevirt-workload-update-h55sx Scheduling vm-upgrade-b-1684798062-9389849
kmp-enabled-for-upgrade kubevirt-workload-update-nhhjw Failed vm-upgrade-b-1684798062-9389849
kmp-enabled-for-upgrade kubevirt-workload-update-pcw9r Failed vm-upgrade-b-1684798062-9389849
kmp-enabled-for-upgrade kubevirt-workload-update-qtb4p Failed vm-upgrade-b-1684798062-9389849
kmp-enabled-for-upgrade kubevirt-workload-update-r6g4c Failed vm-upgrade-a-1684798012-582829
kmp-enabled-for-upgrade kubevirt-workload-update-vcxp5 Failed vm-upgrade-b-1684798062-9389849
kmp-enabled-for-upgrade kubevirt-workload-update-wdl65 Failed vm-upgrade-a-1684798012-582829
kmp-enabled-for-upgrade kubevirt-workload-update-x9m58 Failed vm-upgrade-a-1684798012-582829
kmp-enabled-for-upgrade kubevirt-workload-update-xd2dn Failed vm-upgrade-b-1684798062-9389849
test-upgrade-namespace kubevirt-evacuation-brgqc Succeeded vm-for-product-upgrade-ocs-1684797074-763264
test-upgrade-namespace kubevirt-evacuation-scpl2 Succeeded manual-run-strategy-vm-1684797367-9024162
test-upgrade-namespace kubevirt-workload-update-2srsz Failed vm-for-product-upgrade-ocs-1684797074-763264
test-upgrade-namespace kubevirt-workload-update-42xtr Failed windows-vm-1684797613-5818446
test-upgrade-namespace kubevirt-workload-update-4dw4z Failed vmb-macspoof-1684797942-502964
test-upgrade-namespace kubevirt-workload-update-59mdh Failed windows-vm-1684797613-5818446
test-upgrade-namespace kubevirt-workload-update-625h2 Scheduling vmb-macspoof-1684797942-502964
test-upgrade-namespace kubevirt-workload-update-6jnrm Failed vm-for-product-upgrade-nfs-1684797075-074361
test-upgrade-namespace kubevirt-workload-update-7c82l Failed vm-for-product-upgrade-nfs-1684797075-074361
test-upgrade-namespace kubevirt-workload-update-89vfh Failed vma-macspoof-1684797941-7677495
test-upgrade-namespace kubevirt-workload-update-8g2vb Scheduling windows-vm-1684797613-5818446
test-upgrade-namespace kubevirt-workload-update-99jfd Failed always-run-strategy-vm-1684797368-4356692
test-upgrade-namespace kubevirt-workload-update-9m4vq Failed vm-for-product-upgrade-ocs-1684797074-763264
test-upgrade-namespace kubevirt-workload-update-bmp9g Failed vmb-macspoof-1684797942-502964
test-upgrade-namespace kubevirt-workload-update-c4vlr Failed always-run-strategy-vm-1684797368-4356692
test-upgrade-namespace kubevirt-workload-update-fsplx Failed always-run-strategy-vm-1684797368-4356692
test-upgrade-namespace kubevirt-workload-update-fzg5f Failed vm-for-product-upgrade-ocs-1684797074-763264
test-upgrade-namespace kubevirt-workload-update-jdg62 Failed windows-vm-1684797613-5818446
test-upgrade-namespace kubevirt-workload-update-jqhsp Pending vm-for-product-upgrade-nfs-1684797075-074361
test-upgrade-namespace kubevirt-workload-update-jxtf4 Failed vma-macspoof-1684797941-7677495
test-upgrade-namespace kubevirt-workload-update-jzj7p Failed vm-for-product-upgrade-nfs-1684797075-074361
test-upgrade-namespace kubevirt-workload-update-k2kkq Failed manual-run-strategy-vm-1684797367-9024162
test-upgrade-namespace kubevirt-workload-update-k7gvq Failed always-run-strategy-vm-1684797368-4356692
test-upgrade-namespace kubevirt-workload-update-kxcsq Failed vmb-macspoof-1684797942-502964
test-upgrade-namespace kubevirt-workload-update-ncgqc Failed always-run-strategy-vm-1684797368-4356692
test-upgrade-namespace kubevirt-workload-update-njc4g Failed vma-macspoof-1684797941-7677495
test-upgrade-namespace kubevirt-workload-update-nsnhn Failed windows-vm-1684797613-5818446
test-upgrade-namespace kubevirt-workload-update-p6vhv Failed manual-run-strategy-vm-1684797367-9024162
test-upgrade-namespace kubevirt-workload-update-pfc22 Failed vma-macspoof-1684797941-7677495
test-upgrade-namespace kubevirt-workload-update-qscks Failed manual-run-strategy-vm-1684797367-9024162
test-upgrade-namespace kubevirt-workload-update-qxpm6 Failed vm-for-product-upgrade-ocs-1684797074-763264
test-upgrade-namespace kubevirt-workload-update-rbg45 Failed windows-vm-1684797613-5818446
test-upgrade-namespace kubevirt-workload-update-rqc55 Failed vma-macspoof-1684797941-7677495
test-upgrade-namespace kubevirt-workload-update-s7rbb Failed vm-for-product-upgrade-nfs-1684797075-074361
test-upgrade-namespace kubevirt-workload-update-vgnx6 Succeeded vm-snapshot-upgrade-a-1684797866-069799
test-upgrade-namespace kubevirt-workload-update-wbgds Failed vm-for-product-upgrade-nfs-1684797075-074361
test-upgrade-namespace kubevirt-workload-update-zpptm Succeeded fedora-hotplug-upg-1684797719-130676
[cloud-user@ocp-ipi-executor-xl ~]$
=======================
[cloud-user@ocp-ipi-executor-xl ~]$ oc get vmi -A
NAMESPACE NAME AGE PHASE IP NODENAME READY
kmp-enabled-for-upgrade vm-upgrade-a-1684798012-582829 90m Running 10.131.0.82 c01-dbn-412-nh5nx-worker-0-mdt7r True
kmp-enabled-for-upgrade vm-upgrade-b-1684798062-9389849 90m Running 10.131.0.81 c01-dbn-412-nh5nx-worker-0-mdt7r True
test-upgrade-namespace always-run-strategy-vm-1684797368-4356692 103m Running 10.128.2.100 c01-dbn-412-nh5nx-worker-0-pgrrn True
test-upgrade-namespace fedora-hotplug-upg-1684797719-130676 88m Running 10.129.2.93 c01-dbn-412-nh5nx-worker-0-t5tzl True
test-upgrade-namespace manual-run-strategy-vm-1684797367-9024162 103m Running 10.128.2.101 c01-dbn-412-nh5nx-worker-0-pgrrn True
test-upgrade-namespace vm-for-product-upgrade-hos-1684797076-2242768 88m Running 10.128.2.79 c01-dbn-412-nh5nx-worker-0-pgrrn True
test-upgrade-namespace vm-for-product-upgrade-nfs-1684797075-074361 106m Running 10.131.0.125 c01-dbn-412-nh5nx-worker-0-mdt7r True
test-upgrade-namespace vm-for-product-upgrade-ocs-1684797074-763264 108m Running 10.131.0.127 c01-dbn-412-nh5nx-worker-0-mdt7r True
test-upgrade-namespace vm-snapshot-upgrade-a-1684797866-069799 88m Running 10.129.2.95 c01-dbn-412-nh5nx-worker-0-t5tzl True
test-upgrade-namespace vma-macspoof-1684797941-7677495 82m Running 10.131.0.135 c01-dbn-412-nh5nx-worker-0-mdt7r True
test-upgrade-namespace vmb-macspoof-1684797942-502964 84m Running 10.131.0.132 c01-dbn-412-nh5nx-worker-0-mdt7r True
test-upgrade-namespace windows-vm-1684797613-5818446 99m Running 10.128.2.99 c01-dbn-412-nh5nx-worker-0-pgrrn True
[cloud-user@ocp-ipi-executor-xl ~]$ oc get pdb -A
NAMESPACE NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
openshift-apiserver openshift-apiserver-pdb N/A 1 1 5h7m
openshift-cloud-controller-manager openstack-cloud-controller-manager 1 N/A 1 3h26m
openshift-cluster-csi-drivers openstack-cinder-csi-driver-controller-pdb N/A 1 1 5h7m
openshift-cluster-storage-operator csi-snapshot-controller-pdb N/A 1 1 5h7m
openshift-cluster-storage-operator csi-snapshot-webhook-pdb N/A 1 1 5h7m
openshift-cnv virt-api-pdb 1 N/A 1 4h7m
openshift-cnv virt-controller-pdb 1 N/A 1 4h6m
openshift-cnv virt-exportproxy-pdb 1 N/A 1 77m
openshift-console console N/A 1 1 4h57m
openshift-console downloads N/A 1 1 4h57m
openshift-etcd etcd-guard-pdb 2 N/A 1 3h40m
openshift-image-registry image-registry 0 N/A 1 4h59m
openshift-ingress router-default N/A 50% 1 5h7m
openshift-kube-apiserver kube-apiserver-guard-pdb 2 N/A 1 5h7m
openshift-kube-controller-manager kube-controller-manager-guard-pdb 2 N/A 1 5h7m
openshift-kube-scheduler openshift-kube-scheduler-guard-pdb 2 N/A 1 5h7m
openshift-monitoring alertmanager-main N/A 1 1 4h56m
openshift-monitoring prometheus-adapter 1 N/A 1 4h55m
openshift-monitoring prometheus-k8s 1 N/A 1 4h56m
openshift-monitoring prometheus-operator-admission-webhook 1 N/A 1 5h6m
openshift-monitoring thanos-querier-pdb 1 N/A 1 4h56m
openshift-nmstate nmstate-webhook 1 N/A 1 4h41m
openshift-oauth-apiserver oauth-apiserver-pdb N/A 1 1 5h7m
openshift-operator-lifecycle-manager packageserver-pdb N/A 1 1 5h9m
openshift-storage rook-ceph-mds-ocs-storagecluster-cephfilesystem 1 N/A 1 4h16m
openshift-storage rook-ceph-mon-pdb N/A 1 1 4h14m
openshift-storage rook-ceph-osd N/A 1 1 84m
test-upgrade-namespace kubevirt-disruption-budget-594hm 2 N/A 0 106m
test-upgrade-namespace kubevirt-disruption-budget-6sb68 1 N/A 0 103m
test-upgrade-namespace kubevirt-disruption-budget-82l5l 2 N/A 0 99m
test-upgrade-namespace kubevirt-disruption-budget-pb46v 1 N/A 0 108m
test-upgrade-namespace kubevirt-disruption-budget-shglc 1 N/A 0 104m
[cloud-user@ocp-ipi-executor-xl ~]$
========================
After 3 hours only 2 vms completed automatic live migration:
========================
22:44:08 2023-05-23T02:44:06.824404 tests.compute.upgrade.utils ERROR Migratable vms: ['vm-for-product-upgrade-nfs-1684797075-074361', 'always-run-strategy-vm-1684797368-4356692', 'windows-vm-1684797613-5818446', 'vm-for-product-upgrade-ocs-1684797074-763264', 'manual-run-strategy-vm-1684797367-9024162'], vms with completed automatic workload update: ['vm-snapshot-upgrade-a-1684797866-069799', 'fedora-hotplug-upg-1684797719-130676'], and vms with failed automatic workload update: ['manual-run-strategy-vm-1684797367-9024162', 'vm-for-product-upgrade-nfs-1684797075-074361', 'windows-vm-1684797613-5818446', 'vm-for-product-upgrade-ocs-1684797074-763264', 'always-run-strategy-vm-1684797368-4356692']
22:44:08
=======================
I see the following messages in virt controller log:
=======================
W0523 00:55:41.076823 1 warnings.go:70] would violate PodSecurity "restricted:v1.24": seLinuxOptions (pod set forbidden securityContext.seLinuxOptions: type "virt_launcher.process"), allowPrivilegeEscalation != false (container "compute" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "compute" must set securityContext.capabilities.drop=["ALL"]; container "compute" must not include "SYS_PTRACE" in securityContext.capabilities.add)
====================
[cloud-user@ocp-ipi-executor-xl ~]$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.12.18 True False 15h Cluster version is 4.12.18
[cloud-user@ocp-ipi-executor-xl ~]$ oc get csv -n openshift-cnv
NAME DISPLAY VERSION REPLACES PHASE
kubevirt-hyperconverged-operator.v4.12.3 OpenShift Virtualization 4.12.3 kubevirt-hyperconverged-operator.v4.11.4 Succeeded
[cloud-user@ocp-ipi-executor-xl ~]$
Expected results:
Vms would complete automatic workload update, post cnv upgrade
Additional info:
Must-gather saved here: https://drive.google.com/drive/folders/1HagvkR4n4h4uYbnppqDFlH3UgYCzkIS1?usp=share_link
- external trackers