-
Bug
-
Resolution: Done-Errata
-
Blocker
-
CNV v4.15.0
-
None
-
Incidents & Support
-
0.42
-
True
-
-
False
-
-
Storage Core Sprint 249
-
No
Description of problem:
OCP upgrade is being blocked due to hotplugged vm not evicting
Version-Release number of selected component (if applicable):
4.14.3 upgrade ocp to 4.15.0
How reproducible:
100%
Steps to Reproduce:
1. 2. 3.
Actual results:
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get nodes
NAME STATUS ROLES AGE VERSION
cnv-qe-infra-29.cnvqe2.lab.eng.rdu2.redhat.com Ready control-plane,master 6h6m v1.28.6+0fb4726
cnv-qe-infra-30.cnvqe2.lab.eng.rdu2.redhat.com Ready control-plane,master 6h8m v1.28.6+0fb4726
cnv-qe-infra-31.cnvqe2.lab.eng.rdu2.redhat.com Ready control-plane,master 6h8m v1.28.6+0fb4726
cnv-qe-infra-32.cnvqe2.lab.eng.rdu2.redhat.com Ready worker 4h53m v1.28.6+0fb4726
cnv-qe-infra-33.cnvqe2.lab.eng.rdu2.redhat.com Ready,SchedulingDisabled worker 4h51m v1.27.10+28ed2d7
cnv-qe-infra-34.cnvqe2.lab.eng.rdu2.redhat.com Ready worker 4h53m v1.28.6+0fb4726
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$
pdb:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
creationTimestamp: "2024-02-08T22:52:42Z"
generateName: kubevirt-disruption-budget-
generation: 25
name: kubevirt-disruption-budget-qdcr9
namespace: test-upgrade-namespace
ownerReferences:
- apiVersion: kubevirt.io/v1
blockOwnerDeletion: true
controller: true
kind: VirtualMachineInstance
name: fedora-hotplug-upg-1707432657-9790156
uid: acb3931d-c57f-40d8-bfba-5e44e82ae8ab
resourceVersion: "323478"
uid: 88b6ae22-39fe-4da8-8d4a-52fcb33d98a6
spec:
minAvailable: 1
selector:
matchLabels:
kubevirt.io/created-by: acb3931d-c57f-40d8-bfba-5e44e82ae8ab
status:
conditions:
- lastTransitionTime: "2024-02-09T01:23:25Z"
message: ""
observedGeneration: 25
reason: InsufficientPods
status: "False"
type: DisruptionAllowed
currentHealthy: 1
desiredHealthy: 1
disruptionsAllowed: 0
expectedPods: 13
observedGeneration: 25
VM:
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get vm fedora-hotplug-upg-1707432657-9790156 -n test-upgrade-namespace -o yaml
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
annotations:
kubemacpool.io/transaction-timestamp: "2024-02-08T22:52:23.534961582Z"
kubevirt.io/latest-observed-api-version: v1
kubevirt.io/storage-observed-api-version: v1
creationTimestamp: "2024-02-08T22:50:58Z"
finalizers:
- kubevirt.io/virtualMachineControllerFinalize
generation: 3
labels:
created-by-dynamic-class-creator: "Yes"
kubevirt.io/vm: fedora-hotplug-upg
name: fedora-hotplug-upg-1707432657-9790156
namespace: test-upgrade-namespace
resourceVersion: "322703"
uid: 12b277a7-5612-45df-8ee5-3c5c45be1a22
spec:
running: true
template:
metadata:
creationTimestamp: null
labels:
debugLogs: "true"
kubevirt.io/domain: fedora-hotplug-upg-1707432657-9790156
kubevirt.io/vm: fedora-hotplug-upg-1707432657-9790156
spec:
architecture: amd64
domain:
cpu:
cores: 1
devices:
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
- disk:
bus: scsi
name: blank-dv
serial: "1234567890"
interfaces:
- macAddress: 02:f5:7c:00:00:04
masquerade: {}
name: default
rng: {}
machine:
type: pc-q35-rhel9.2.0
resources:
requests:
memory: 1Gi
networks:
- name: default
pod: {}
terminationGracePeriodSeconds: 30
volumes:
- containerDisk:
image: quay.io/openshift-cnv/qe-cnv-tests-fedora:38@sha256:d0658b20dc8474caedd061f02ea4e5c3c35922a472f0ec141c264005291be2f3
name: containerdisk
- cloudInitNoCloud:
userData: |-
#cloud-config
chpasswd:
expire: false
password: password
user: fedora
ssh_pwauth: true
ssh_authorized_keys:
[ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCj47ubVnxR16JU7ZfDli3N5QVBAwJBRh2xMryyjk5dtfugo5JIPGB2cyXTqEDdzuRmI+Vkb/A5duJyBRlA+9RndGGmhhMnj8and3wu5/cEb7DkF6ZJ25QV4LQx3K/i57LStUHXRTvruHOZ2nCuVXWqi7wSvz5YcvEv7O8pNF5uGmqHlShBdxQxcjurXACZ1YY0YDJDr3AJai1KF9zehVJODuSbrnOYpThVWGjFuFAnNxbtuZ8EOSougN2aYTf2qr/KFGDHtewIkzZmP6cjzKO5bN3pVbXxmb2Gces/BYHntY4MXBTUqwsmsCRC5SAz14bEP/vsLtrNhjq9vCS+BjMT root@exec1.rdocloud]
runcmd: ['grep ssh-rsa /etc/crypto-policies/back-ends/opensshserver.config || sudo update-crypto-policies --set LEGACY || true', "sudo sed -i 's/^#\\?PasswordAuthentication no/PasswordAuthentication yes/g' /etc/ssh/sshd_config", 'sudo systemctl enable sshd', 'sudo systemctl restart sshd']
name: cloudinitdisk
- dataVolume:
hotpluggable: true
name: blank-dv
name: blank-dv
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2024-02-08T22:53:09Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: null
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: null
status: "True"
type: LiveMigratable
- lastProbeTime: "2024-02-08T22:53:50Z"
lastTransitionTime: null
status: "True"
type: AgentConnected
created: true
desiredGeneration: 3
observedGeneration: 3
printableStatus: Running
ready: true
volumeSnapshotStatuses:
- enabled: false
name: containerdisk
reason: Snapshot is not supported for this volumeSource type [containerdisk]
- enabled: false
name: cloudinitdisk
reason: Snapshot is not supported for this volumeSource type [cloudinitdisk]
- enabled: true
name: blank-dv
Followings are found in machine config controller log:
E0209 01:32:34.367292 1 render_controller.go:439] Error syncing Generated MCFG: %!w(*errors.StatusError=&{{{ } { <nil>} Failure Operation cannot be fulfilled on machineconfigpools.machineconfiguration.openshift.io "worker": the object has been modified; please apply your changes to the latest version and try again Conflict 0xc0041148a0 409}})
E0209 01:32:34.391824 1 render_controller.go:461] Error updating MachineConfigPool worker: Operation cannot be fulfilled on machineconfigpools.machineconfiguration.openshift.io "worker": the object has been modified; please apply your changes to the latest version and try again
I0209 01:32:34.391846 1 render_controller.go:378] Error syncing machineconfigpool worker: Operation cannot be fulfilled on machineconfigpools.machineconfiguration.openshift.io "worker": the object has been modified; please apply your changes to the latest version and try again
I0209 01:32:38.771452 1 drain_controller.go:152] evicting pod test-upgrade-namespace/virt-launcher-fedora-hotplug-upg-1707432657-9790156-nctjv
E0209 01:32:38.788727 1 drain_controller.go:152] error when evicting pods/"virt-launcher-fedora-hotplug-upg-1707432657-9790156-nctjv" -n "test-upgrade-namespace" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0209 01:32:39.258979 1 node_controller.go:1035] No nodes available for updates
I0209 01:32:39.259123 1 status.go:224] Degraded Machine: cnv-qe-infra-33.cnvqe2.lab.eng.rdu2.redhat.com and Degraded Reason: failed to drain node: cnv-qe-infra-33.cnvqe2.lab.eng.rdu2.redhat.com after 1 hour. Please see machine-config-controller logs for more information
I0209 01:32:43.789715 1 drain_controller.go:152] evicting pod test-upgrade-namespace/virt-launcher-fedora-hotplug-upg-1707432657-9790156-nctjv
E0209 01:32:43.804143 1 drain_controller.go:152] error when evicting pods/"virt-launcher-fedora-hotplug-upg-1707432657-9790156-nctjv" -n "test-upgrade-namespace" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0209 01:32:44.291672 1 node_controller.go:1035] No nodes available for updates
I0209 01:32:44.291877 1 status.go:224] Degraded Machine: cnv-qe-infra-33.cnvqe2.lab.eng.rdu2.redhat.com and Degraded Reason: failed to drain node: cnv-qe-infra-33.cnvqe2.lab.eng.rdu2.redhat.com after 1 hour. Please see machine-config-controller logs for more information
I see all these virt pods in error state:
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get pods -n test-upgrade-namespace
NAME READY STATUS RESTARTS AGE
hp-volume-d82rr 0/1 Pending 0 62m
virt-launcher-always-run-strategy-vm-1707432374-0880806-9rf26 1/1 Running 0 62m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-2rb5v 0/2 Error 0 62m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-49q59 0/2 Error 0 42m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-5h9tm 0/2 Error 0 56m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-7hmmx 0/2 Error 0 7m21s
virt-launcher-fedora-hotplug-upg-1707432657-9790156-k2dsr 0/2 Error 0 48m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-mxvd2 0/2 Error 0 59m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-nctjv 2/2 Running 0 162m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-nj528 0/2 Error 0 58m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-nzv7c 0/2 Error 0 19m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-qdj6p 0/2 Completed 0 24m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-v5bbs 0/2 Error 0 36m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-w4gtm 0/2 Completed 0 30m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-w6dpd 0/2 Completed 0 13m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-x9w9v 0/2 Error 0 53m
virt-launcher-fedora-hotplug-upg-1707432657-9790156-zrhbg 0/2 Error 0 95s
virt-launcher-manual-run-strategy-vm-1707432373-755645-f8gm8 1/1 Running 0 59m
virt-launcher-vm-bridge-connected-1707433221-2068923-8cgff 2/2 Running 0 62m
virt-launcher-vm-for-product-upgrade-ocs-1707432301-53998376q95 1/1 Running 0 85m
virt-launcher-vm-snapshot-upgrade-a-1707432958-4519243-5bc6p 1/1 Running 0 85m
virt-launcher-vma-macspoof-1707433341-3924508-6dptd 2/2 Running 0 106m
virt-launcher-vmb-macspoof-1707433342-277827-pxgs2 2/2 Running 0 106m
virt-launcher-windows-vm-1707432617-3495035-gdlnw 1/1 Running 0 85m
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$
Expected results:
OCP upgrade should continue successfully
Additional info:
Cluster is available for triage. Will attach must gather
- is cloned by
-
CNV-38277 [4.14] No ability to evict a hotplugged vm this is blocking ocp upgrade
-
- Closed
-
- links to
-
RHSA-2023:116760
OpenShift Virtualization 4.15.0 Images
- mentioned on