-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
Quality / Stability / Reliability
-
0.42
-
False
-
-
False
-
None
-
-
None
Description of problem:
If the cluster has came into a state in which live migrations that have been triggered by kubevirt-workload-update are constantly failing, it produce a situation in which a target virt-launcher pod is created every 5 minutes for every VM in the cluster. This results in thousands of virt-launcher pods with "Error" state lying around the cluster, overloading etcd and might cause the cluster to become less responsive in short time.
Version-Release number of selected component (if applicable):
all versions
How reproducible:
if VMIMs are failing, 100%
Steps to Reproduce:
1. reproduce bug https://issues.redhat.com/browse/RHEL-131697 (for example) 2. observe that a new virt-launcher target pod is created every 5 minutes and then got failed, over and over again for every VM in the cluster. 3.
Actual results:
thousands of Errored virt-launcher pods reside on the cluster in 1 day
Expected results:
there should be an exponential backoff mechanism in such case.
Additional info:
If there is a permanent issue, retries shouldn't be executed in constant intervals of 5 minutes. Instead the next retry should be twice as long as the previous one. Until an upper limit of, for example, 4 hours.
- clones
-
CNV-74856 [Tracker] Live migration after workload update fails with operation failed: guest CPU doesn't match specification: missing features: pdcm
-
- New
-