Loading...

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: None
Affects Version/s: CNV v4.20.0
Component/s: CNV Virtualization
Labels:

Activity Type:
Quality / Stability / Reliability
Story Points:
0.42
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Market:

Severity:
Critical

Regression:
Yes

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:

We (perf&scale) have a workload called virt-density that creates 200 VMs per host in a 6 worker environment. We have been running this workload daily since OCP 4.18.

When we tried to onboard 4.20 we found that we are unable to complete the run.

A lots of virt-launches pods have problems:

[root@m42-h01-000-r760 ~]# oc get po -n virt-density |tail
virt-launcher-virt-density-90-tpzz9 0/3 Error 0 173m
virt-launcher-virt-density-91-nv6w2 3/3 Running 0 44m
virt-launcher-virt-density-92-5q8d5 0/3 Init:ImageInspectError 0 15m
virt-launcher-virt-density-93-7zg7b 0/3 Init:ImageInspectError 0 15m
virt-launcher-virt-density-94-vr65v 0/3 Init:CreateContainerError 0 33m
virt-launcher-virt-density-95-lxssd 3/3 Running 0 42m
virt-launcher-virt-density-96-qs7pj 3/3 Running 1 3h9m
virt-launcher-virt-density-97-flhtg 3/3 Running 0 4h16m
virt-launcher-virt-density-98-bt4n5 0/3 Init:ImageInspectError 0 15m
virt-launcher-virt-density-99-bfknn 1/3 Init:ImageInspectError 0 31m

Looking into one of the pods:

[root@m42-h01-000-r760 ~]# oc describe po virt-launcher-virt-density-92-5q8d5 | tail
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 15m default-scheduler Successfully assigned virt-density/virt-launcher-virt-density-92-5q8d5 to m42-h09-000-r760
Normal AddedInterface 15m multus Add eth0 [10.129.2.253/23] from ovn-kubernetes
Warning Failed 11m (x2 over 13m) kubelet Error: context deadline exceeded
Normal Pulled 10m (x3 over 15m) kubelet Container image "quay.io/openshift-cnv/container-native-virtualization-virt-launcher-rhel9@sha256:8f99b9bcab79ae7d5fe92f17efc700778fb195fe8c61341c56ac344545a396dc" already present on machine
Warning Failed 8m33s kubelet Error: stream terminated by RST_STREAM with error code: CANCEL
Warning InspectFailed 31s (x4 over 6m32s) kubelet Failed to inspect image "": rpc error: code = DeadlineExceeded desc = context deadline exceeded
Warning Failed 31s (x4 over 6m32s) kubelet Error: ImageInspectError

The CNV components seem to be failing during and after the test (they are healthy before):

oc get pods | grep -v Run
NAME READY STATUS RESTARTS AGE
aaq-operator-744b9d7bf6-sb95f 0/1 CrashLoopBackOff 21 (3m43s ago) 53m
cdi-apiserver-689cdf75f5-dg5m4 0/1 CrashLoopBackOff 12 (2m43s ago) 45m
cdi-deployment-8474657b56-k5w2g 0/1 CrashLoopBackOff 12 (2m59s ago) 45m
cluster-network-addons-operator-97556c5c9-7scfz 2/2 Terminating 0 7h32m
hco-operator-7ccd979ccc-x6xts 0/1 CrashLoopBackOff 14 (5m7s ago) 53m
hco-webhook-5fdcfd7b78-pb84r 0/1 CrashLoopBackOff 15 (3m51s ago) 53m
hostpath-provisioner-operator-788c9697d6-6grwc 0/1 CrashLoopBackOff 21 (109s ago) 53m
kubemacpool-cert-manager-6f94bfbfbd-mkrbm 1/1 Terminating 0 7h31m
kubevirt-console-plugin-56bf7bd6fb-bwqh4 1/1 Terminating 0 7h30m
kubevirt-ipam-controller-manager-7d6978d694-8dq57 1/1 Terminating 0 7h31m
ssp-operator-589f9f4576-sk2cl 0/1 CrashLoopBackOff 16 (3m40s ago) 53m
ssp-operator-6874fc8966-5tl4d 1/1 Terminating 1 (7h31m ago) 7h32m
virt-handler-khxrp 0/1 CrashLoopBackOff 25 (5m7s ago) 7h30m
virt-operator-74d9df7468-wj4pj 0/1 CrashLoopBackOff 19 (4m57s ago) 53m
virt-operator-d687d8bcb-wq2g5 0/1 Terminating 0 3h20m
virt-template-validator-748c84d6dc-zv2s9 1/1 Terminating 0 7h30m

Looking into one of the pods:

# oc describe po virt-handler-khxrp
...
Events:
Type Reason Age From Message
------ ---- ---- -------
Warning ProbeError 172m (x18 over 15h) kubelet Liveness probe error: Get "https://10.128.2.16:8443/healthz": dial tcp 10.128.2.16:8443: connect: connection refused
body:
Warning ProbeError 148m (x95 over 17h) kubelet Readiness probe error: Get "https://10.128.2.16:8443/healthz": EOF
body:
Warning ProbeError 138m (x210 over 17h) kubelet Liveness probe error: Get "https://10.128.2.16:8443/healthz": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
body:
Warning ProbeError 93m (x251 over 17h) kubelet Liveness probe error: Get "https://10.128.2.16:8443/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
body:
Warning ProbeError 48m (x174 over 17h) kubelet Readiness probe error: Get "https://10.128.2.16:8443/healthz": dial tcp 10.128.2.16:8443: connect: connection refused
body:
Normal Created 40m (x173 over 21h) kubelet Created container: virt-handler
Warning ProbeError 23m (x684 over 17h) kubelet Readiness probe error: Get "https://10.128.2.16:8443/healthz": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
body:
Warning BackOff 8m14s (x2156 over 15h) kubelet Back-off restarting failed container virt-handler in pod virt-handler-khxrp_openshift-cnv(65b84f8b-e952-4a41-b73f-4d05f0d07bb0)
Normal Pulled 6m34s (x178 over 17h) kubelet Container image "quay.io/openshift-cnv/container-native-virtualization-virt-handler-rhel9@sha256:7a09454ddbcaef244b4ce67f82b67d26e602e1577cef8f8f1e2086bf82429a3a" already present on machine
Normal Killing 4m39s (x179 over 17h) kubelet Container virt-handler failed liveness probe, will be restarted
Warning ProbeError 3m17s (x717 over 17h) kubelet Readiness probe error: Get "https://10.128.2.16:8443/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Some additional context in case it's helpful:

How we deploy the CNV operator: https://github.com/openshift/release/blob/master/ci-operator/step-registry/openshift-qe/installer/bm/day2/cnv/
How we run the test: https://github.com/openshift/release/tree/master/ci-operator/step-registry/openshift-qe/virt-density

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

must-gather.tar
12.65 MB
2025/07/23 8:14 AM

is triggering

CNV-63338 [vme-perf] VM live migration to a specific node

New

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide