-
Bug
-
Resolution: Done-Errata
-
Blocker
-
CNV v4.15.0
-
None
-
0.42
-
True
-
-
False
-
---
-
---
-
-
Storage Core Sprint 249
-
No
Description of problem:
OCP upgrade is being blocked due to hotplugged vm not evicting
Version-Release number of selected component (if applicable):
4.14.3 upgrade ocp to 4.15.0
How reproducible:
100%
Steps to Reproduce:
1. 2. 3.
Actual results:
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get nodes NAME STATUS ROLES AGE VERSION cnv-qe-infra-29.cnvqe2.lab.eng.rdu2.redhat.com Ready control-plane,master 6h6m v1.28.6+0fb4726 cnv-qe-infra-30.cnvqe2.lab.eng.rdu2.redhat.com Ready control-plane,master 6h8m v1.28.6+0fb4726 cnv-qe-infra-31.cnvqe2.lab.eng.rdu2.redhat.com Ready control-plane,master 6h8m v1.28.6+0fb4726 cnv-qe-infra-32.cnvqe2.lab.eng.rdu2.redhat.com Ready worker 4h53m v1.28.6+0fb4726 cnv-qe-infra-33.cnvqe2.lab.eng.rdu2.redhat.com Ready,SchedulingDisabled worker 4h51m v1.27.10+28ed2d7 cnv-qe-infra-34.cnvqe2.lab.eng.rdu2.redhat.com Ready worker 4h53m v1.28.6+0fb4726 [cnv-qe-jenkins@cnv-qe-infra-01 ~]$ pdb: apiVersion: policy/v1 kind: PodDisruptionBudget metadata: creationTimestamp: "2024-02-08T22:52:42Z" generateName: kubevirt-disruption-budget- generation: 25 name: kubevirt-disruption-budget-qdcr9 namespace: test-upgrade-namespace ownerReferences: - apiVersion: kubevirt.io/v1 blockOwnerDeletion: true controller: true kind: VirtualMachineInstance name: fedora-hotplug-upg-1707432657-9790156 uid: acb3931d-c57f-40d8-bfba-5e44e82ae8ab resourceVersion: "323478" uid: 88b6ae22-39fe-4da8-8d4a-52fcb33d98a6 spec: minAvailable: 1 selector: matchLabels: kubevirt.io/created-by: acb3931d-c57f-40d8-bfba-5e44e82ae8ab status: conditions: - lastTransitionTime: "2024-02-09T01:23:25Z" message: "" observedGeneration: 25 reason: InsufficientPods status: "False" type: DisruptionAllowed currentHealthy: 1 desiredHealthy: 1 disruptionsAllowed: 0 expectedPods: 13 observedGeneration: 25
VM:
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get vm fedora-hotplug-upg-1707432657-9790156 -n test-upgrade-namespace -o yaml apiVersion: kubevirt.io/v1 kind: VirtualMachine metadata: annotations: kubemacpool.io/transaction-timestamp: "2024-02-08T22:52:23.534961582Z" kubevirt.io/latest-observed-api-version: v1 kubevirt.io/storage-observed-api-version: v1 creationTimestamp: "2024-02-08T22:50:58Z" finalizers: - kubevirt.io/virtualMachineControllerFinalize generation: 3 labels: created-by-dynamic-class-creator: "Yes" kubevirt.io/vm: fedora-hotplug-upg name: fedora-hotplug-upg-1707432657-9790156 namespace: test-upgrade-namespace resourceVersion: "322703" uid: 12b277a7-5612-45df-8ee5-3c5c45be1a22 spec: running: true template: metadata: creationTimestamp: null labels: debugLogs: "true" kubevirt.io/domain: fedora-hotplug-upg-1707432657-9790156 kubevirt.io/vm: fedora-hotplug-upg-1707432657-9790156 spec: architecture: amd64 domain: cpu: cores: 1 devices: disks: - disk: bus: virtio name: containerdisk - disk: bus: virtio name: cloudinitdisk - disk: bus: scsi name: blank-dv serial: "1234567890" interfaces: - macAddress: 02:f5:7c:00:00:04 masquerade: {} name: default rng: {} machine: type: pc-q35-rhel9.2.0 resources: requests: memory: 1Gi networks: - name: default pod: {} terminationGracePeriodSeconds: 30 volumes: - containerDisk: image: quay.io/openshift-cnv/qe-cnv-tests-fedora:38@sha256:d0658b20dc8474caedd061f02ea4e5c3c35922a472f0ec141c264005291be2f3 name: containerdisk - cloudInitNoCloud: userData: |- #cloud-config chpasswd: expire: false password: password user: fedora ssh_pwauth: true ssh_authorized_keys: [ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCj47ubVnxR16JU7ZfDli3N5QVBAwJBRh2xMryyjk5dtfugo5JIPGB2cyXTqEDdzuRmI+Vkb/A5duJyBRlA+9RndGGmhhMnj8and3wu5/cEb7DkF6ZJ25QV4LQx3K/i57LStUHXRTvruHOZ2nCuVXWqi7wSvz5YcvEv7O8pNF5uGmqHlShBdxQxcjurXACZ1YY0YDJDr3AJai1KF9zehVJODuSbrnOYpThVWGjFuFAnNxbtuZ8EOSougN2aYTf2qr/KFGDHtewIkzZmP6cjzKO5bN3pVbXxmb2Gces/BYHntY4MXBTUqwsmsCRC5SAz14bEP/vsLtrNhjq9vCS+BjMT root@exec1.rdocloud] runcmd: ['grep ssh-rsa /etc/crypto-policies/back-ends/opensshserver.config || sudo update-crypto-policies --set LEGACY || true', "sudo sed -i 's/^#\\?PasswordAuthentication no/PasswordAuthentication yes/g' /etc/ssh/sshd_config", 'sudo systemctl enable sshd', 'sudo systemctl restart sshd'] name: cloudinitdisk - dataVolume: hotpluggable: true name: blank-dv name: blank-dv status: conditions: - lastProbeTime: null lastTransitionTime: "2024-02-08T22:53:09Z" status: "True" type: Ready - lastProbeTime: null lastTransitionTime: null status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: null status: "True" type: LiveMigratable - lastProbeTime: "2024-02-08T22:53:50Z" lastTransitionTime: null status: "True" type: AgentConnected created: true desiredGeneration: 3 observedGeneration: 3 printableStatus: Running ready: true volumeSnapshotStatuses: - enabled: false name: containerdisk reason: Snapshot is not supported for this volumeSource type [containerdisk] - enabled: false name: cloudinitdisk reason: Snapshot is not supported for this volumeSource type [cloudinitdisk] - enabled: true name: blank-dv
Followings are found in machine config controller log:
E0209 01:32:34.367292 1 render_controller.go:439] Error syncing Generated MCFG: %!w(*errors.StatusError=&{{{ } { <nil>} Failure Operation cannot be fulfilled on machineconfigpools.machineconfiguration.openshift.io "worker": the object has been modified; please apply your changes to the latest version and try again Conflict 0xc0041148a0 409}}) E0209 01:32:34.391824 1 render_controller.go:461] Error updating MachineConfigPool worker: Operation cannot be fulfilled on machineconfigpools.machineconfiguration.openshift.io "worker": the object has been modified; please apply your changes to the latest version and try again I0209 01:32:34.391846 1 render_controller.go:378] Error syncing machineconfigpool worker: Operation cannot be fulfilled on machineconfigpools.machineconfiguration.openshift.io "worker": the object has been modified; please apply your changes to the latest version and try again I0209 01:32:38.771452 1 drain_controller.go:152] evicting pod test-upgrade-namespace/virt-launcher-fedora-hotplug-upg-1707432657-9790156-nctjv E0209 01:32:38.788727 1 drain_controller.go:152] error when evicting pods/"virt-launcher-fedora-hotplug-upg-1707432657-9790156-nctjv" -n "test-upgrade-namespace" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0209 01:32:39.258979 1 node_controller.go:1035] No nodes available for updates I0209 01:32:39.259123 1 status.go:224] Degraded Machine: cnv-qe-infra-33.cnvqe2.lab.eng.rdu2.redhat.com and Degraded Reason: failed to drain node: cnv-qe-infra-33.cnvqe2.lab.eng.rdu2.redhat.com after 1 hour. Please see machine-config-controller logs for more information I0209 01:32:43.789715 1 drain_controller.go:152] evicting pod test-upgrade-namespace/virt-launcher-fedora-hotplug-upg-1707432657-9790156-nctjv E0209 01:32:43.804143 1 drain_controller.go:152] error when evicting pods/"virt-launcher-fedora-hotplug-upg-1707432657-9790156-nctjv" -n "test-upgrade-namespace" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0209 01:32:44.291672 1 node_controller.go:1035] No nodes available for updates I0209 01:32:44.291877 1 status.go:224] Degraded Machine: cnv-qe-infra-33.cnvqe2.lab.eng.rdu2.redhat.com and Degraded Reason: failed to drain node: cnv-qe-infra-33.cnvqe2.lab.eng.rdu2.redhat.com after 1 hour. Please see machine-config-controller logs for more information
I see all these virt pods in error state:
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get pods -n test-upgrade-namespace
NAME READY STATUS RESTARTS AGE
hp-volume-d82rr 0/1 Pending 0 62m
virt-launcher-always-run-strategy-vm-1707432374-0880806-9rf26 1/1 Running 0 62m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-2rb5v 0/2 Error 0 62m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-49q59 0/2 Error 0 42m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-5h9tm 0/2 Error 0 56m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-7hmmx 0/2 Error 0 7m21s
virt-launcher-fedora-hotplug-upg-1707432657-9790156-k2dsr 0/2 Error 0 48m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-mxvd2 0/2 Error 0 59m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-nctjv 2/2 Running 0 162m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-nj528 0/2 Error 0 58m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-nzv7c 0/2 Error 0 19m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-qdj6p 0/2 Completed 0 24m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-v5bbs 0/2 Error 0 36m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-w4gtm 0/2 Completed 0 30m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-w6dpd 0/2 Completed 0 13m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-x9w9v 0/2 Error 0 53m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-zrhbg 0/2 Error 0 95s
virt-launcher-manual-run-strategy-vm-1707432373-755645-f8gm8 1/1 Running 0 59m
virt-launcher-vm-bridge-connected-1707433221-2068923-8cgff 2/2 Running 0 62m
virt-launcher-vm-for-product-upgrade-ocs-1707432301-53998376q95 1/1 Running 0 85m
virt-launcher-vm-snapshot-upgrade-a-1707432958-4519243-5bc6p 1/1 Running 0 85m
virt-launcher-vma-macspoof-1707433341-3924508-6dptd 2/2 Running 0 106m
virt-launcher-vmb-macspoof-1707433342-277827-pxgs2 2/2 Running 0 106m
virt-launcher-windows-vm-1707432617-3495035-gdlnw 1/1 Running 0 85m
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$
Expected results:
OCP upgrade should continue successfully
Additional info:
Cluster is available for triage. Will attach must gather
- is cloned by
-
CNV-38277 [4.14] No ability to evict a hotplugged vm this is blocking ocp upgrade
- Closed
- is related to
-
CNV-35196 Non-blocking / Best effort live migration during eviction
- Backlog
- links to
-
RHSA-2023:116760 OpenShift Virtualization 4.15.0 Images
- mentioned on
(1 mentioned on)