Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: CNV v4.14.0
Affects Version/s: None
Component/s: CNV Virtualization
Labels:
- Scale
- cnv-4+
- cnvbugsm
- devel_ack+
- pm_ack+
- qa_ack+

Activity Type:
Quality / Stability / Reliability
Story Points:
5
Blocked:
False
Ready:
False
BZ Status:
CLOSED
BZ URL:
https://bugzilla.redhat.com/show_bug.cgi?id=2036027
Bugzilla Bug:
RHBZ: 2036027

Sprint:
CNV Virtualization Sprint 231, CNV Virtualization Sprint 232, CNV Virtualization Sprint 239, CNV Virtualization Sprint 240, CNV Virtualization Sprint 241
Severity:
Important

Regression:
No

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Some background:
-------------------------
I'm running a scale setup with 84 nodes, and I'm using OCS cpeh-rbd as backend storage.
I'm attempting to deploy a large amount of VM's, but I noticed some VM's are missing,
This issue is a real problem for me since it breaks my measurements & prevents VM deployment.
looking at the virt-controller logs we can see the following prints:

{"component":"virt-controller","level":"info","msg":"re-enqueuing VirtualMachine default/master-0-win10-vm0075","pos":"vm.go:175","reason":"Internal error occurred: failed calling webhook \"virtualmachine-validator.kubevirt.io\": Post \"https://virt-api.openshift-cnv.svc:443/virtualmachines-validate?timeout=10s\": context deadline exceeded","timestamp":"2021-12-29T10:26:12.657482Z"} {"component":"virt-controller","kind":"","level":"error","msg":"Updating api version annotations failed","name":"master-0-win10-vm0043","namespace":"default","pos":"vm.go:209","reason":"Internal error occurred: failed calling webhook \"virtualmachine-validator.kubevirt.io\": Post \"https://virt-api.openshift-cnv.svc:443/virtualmachines-validate?timeout=10s\": context deadline exceeded","timestamp":"2021-12-29T10:26:17.599926Z","uid":"ace78d0a-482c-43f8-bb87-c2d16450467b"} {"component":"virt-controller","level":"info","msg":"re-enqueuing VirtualMachine default/master-0-win10-vm0043","pos":"vm.go:175","reason":"Internal error occurred: failed calling webhook \"virtualmachine-validator.kubevirt.io\": Post \"https://virt-api.openshift-cnv.svc:443/virtualmachines-validate?timeout=10s\": context deadline exceeded","timestamp":"2021-12-29T10:26:17.599989Z"} {"component":"virt-controller","kind":"","level":"error","msg":"Updating api version annotations failed","name":"master-0-win10-vm0002","namespace":"default","pos":"vm.go:209","reason":"Internal error occurred: failed calling webhook \"virtualmachine-validator.kubevirt.io\": Post \"https://virt-api.openshift-cnv.svc:443/virtualmachines-validate?timeout=10s\": context deadline exceeded","timestamp":"2021-12-29T10:26:18.298490Z","uid":"467ab09a-e3d0-4a09-8814-52d0ea91dd13"} {"component":"virt-controller","level":"info","msg":"re-enqueuing VirtualMachine default/master-0-win10-vm0002","pos":"vm.go:175","reason":"Internal error occurred: failed calling webhook \"virtualmachine-validator.kubevirt.io\": Post \"https://virt-api.openshift-cnv.svc:443/virtualmachines-validate?timeout=10s\": context deadline exceeded","timestamp":"2021-12-29T10:26:18.298552Z"} {"component":"virt-controller","kind":"","level":"error","msg":"Updating api version annotations failed","name":"master-0-win10-vm0007","namespace":"default","pos":"vm.go:209","reason":"Internal error occurred: failed calling webhook \"virtualmachine-validator.kubevirt.io\": Post \"https://virt-api.openshift-cnv.svc:443/virtualmachines-validate?timeout=10s\": context deadline exceeded","timestamp":"2021-12-29T10:26:22.670068Z","uid":"6640416a-bdad-450e-b11f-38508a3af158"}

[root@e26-h01-000-r640 ~]# oc logs virt-controller-655db5c9cf-rdqfg|grep "Internal error"|wc -l
148

another thing I have to mention is that we never reached the 10s timeout, in most cases we get the "deadline exceeded" almost immediately after submitting the deployment request (via YAML).

Versions of all relevant components:
===================================
CNV 4.9.1
OCS 4.9.0
LSO 4.9.0-202111151318
OCP 4.9.12

must-gather:
============
http://perf148h.perf.lab.eng.bos.redhat.com/share/BZ_logs/cnv_must_gather_failed_calling_webhook.tar.gz

external trackers

Red Hat Errata Tool 113931

Red Hat Issue Tracker CNV-15550

Red Hat Product Errata RHSA-2023:6817

Assignee:: Igor Bezukh

Reporter:: Boaz Ben Shabat

QA Contact:: Guy Chen

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2021/12/29 11:10 AM

Updated:: 2025/08/09 8:40 PM

Resolved:: 2023/11/08 2:05 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates