-
Bug
-
Resolution: Obsolete
-
Major
-
CNV v4.15.0
-
None
-
Quality / Stability / Reliability
-
0.42
-
False
-
-
False
-
-
Moderate
-
No
Description of problem:
We see a surge of failing tests across all kubevirt lanes where a VMI fails to delete within a timeout of 2 minutes: https://main-jenkins-csb-cnvqe.apps.ocp-c1.prod.psi.redhat.com/job/test-kubevirt-cnv-4.15-network-ovn-ocs/172/testReport/(root)/Tests%20Suite/_sig_network__Services_Masquerade_interface_binding__without__a_service_matching_the_vmi_exposed_should_fail_to_reach_the_vmi/ https://main-jenkins-csb-cnvqe.apps.ocp-c1.prod.psi.redhat.com/job/test-kubevirt-cnv-4.15-compute-ocs/191/testReport/(root)/Tests%20Suite/_Serial__sig_compute_Infrastructure_changes_to_the_kubernetes_client_on_the_virt_handler_rate_limiter_should_lead_to_delayed_VMI_running_states/ https://main-jenkins-csb-cnvqe.apps.ocp-c1.prod.psi.redhat.com/job/test-kubevirt-cnv-4.15-storage-ocs/209/testReport/(root)/Tests%20Suite/_sig_storage__DataVolume_Integration__rfe_id_3188__crit_high__vendor_cnv_qe_redhat_com__level_system__Starting_a_VirtualMachineInstance_with_a_DataVolume_as_a_volume_source_Alpine_import__test_id_5252_should_be_successfully_started_when_using_a_PVC_volume_owned_by_a_DataVolume/
Version-Release number of selected component (if applicable):
CNV 4.15.0
How reproducible:
Very common on test lanes
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
It's possible this is related to the new guest console log container. When such VMI is stuck, these logs are obserable on the node: sh-5.1# journalctl --no-pager | grep virt-launcher-vm-cirros-source-qwpcf | grep err Dec 20 16:15:40 alex-rc0-w77cr-worker-0-z5pg2 kubenswrapper[3798]: E1220 16:15:40.865833 3798 kuberuntime_container.go:750] "Container termination failed with gracePeriod" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" pod="default/virt-launcher-vm-cirros-source-qwpcf" podUID="09b06c37-7f7e-4a77-9378-e72fe8e0d8bc" containerName="guest-console-log" containerID="cri-o://f9930f6da1186a1be274a4d673ff630481f1b7bd5100cb40c93b6d3429c983d8" gracePeriod=30 Dec 20 16:15:40 alex-rc0-w77cr-worker-0-z5pg2 kubenswrapper[3798]: E1220 16:15:40.865887 3798 kuberuntime_container.go:775] "Kill container failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" pod="default/virt-launcher-vm-cirros-source-qwpcf" podUID="09b06c37-7f7e-4a77-9378-e72fe8e0d8bc" containerName="guest-console-log" containerID={"Type":"cri-o","ID":"f9930f6da1186a1be274a4d673ff630481f1b7bd5100cb40c93b6d3429c983d8"} Dec 20 16:15:41 alex-rc0-w77cr-worker-0-z5pg2 kubenswrapper[3798]: E1220 16:15:41.164844 3798 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"KillContainer\" for \"guest-console-log\" with KillContainerError: \"rpc error: code = DeadlineExceeded desc = context deadline exceeded\"" pod="default/virt-launcher-vm-cirros-source-qwpcf" podUID="09b06c37-7f7e-4a77-9378-e72fe8e0d8bc" Is it possible the guest console log container is not reacting very nicely to termination signals at all times?
- is related to
-
OCPBUGS-27949 Lazy pod removal with recent CRI-O releases
-
- Closed
-
- relates to
-
CNV-37706 Do not ship CNV v4.15.0 with SerialConsoleLog on by default
-
- Closed
-
- links to
- mentioned on
(9 mentioned on)