Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: CNV v4.14.0
Affects Version/s: None
Component/s: CNV Network
Labels:
- cnv-4+
- cnvbugsm
- devel_ack+
- pm_ack+
- qa_ack+

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
BZ Status:
CLOSED
BZ URL:
https://bugzilla.redhat.com/show_bug.cgi?id=2228240
Bugzilla Bug:
RHBZ: 2228240
Intelligence Requested:
Market:

Severity:
Critical

Regression:
No

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:
A VM with secondary NIC fails to start.

Version-Release number of selected component (if applicable):
CNV: brew.registry.redhat.io/rh-osbs/iib:545376 (v4.14.0.rhel9-1403)
cluster-network-addons-operator-rhel9:v4.14.0-23
multus-dynamic-networks-rhel9:v4.14.0-23

How reproducible:
100% (observed several times, on different clusters)

Steps to Reproduce:
1.
Create a linux-bridge on the cluster workers using a NodeNetworkConfigurationPolicy like the attached bridge-nncp.yaml:
$ oc apply -f bridge-nncp.yaml
nodenetworkconfigurationpolicy.nmstate.io/linux-bridge-nncp created
$
$ oc get nncp -w
NAME STATUS REASON
linux-bridge-nncp
linux-bridge-nncp Progressing ConfigurationProgressing
...
linux-bridge-nncp Progressing ConfigurationProgressing
linux-bridge-nncp Available SuccessfullyConfigured

2.
Create a NetworkAttachmentDefinition like the attached bridge-nad.yaml
$ oc apply -f bridge-nad.yaml
networkattachmentdefinition.k8s.cni.cncf.io/br1test-nad created

3. Create and start a VM like the one of the attached vma.yaml
(Please note that I scheduled the VM on a specific node using a nodeSelector, just so I know which journalctl I need to follow for debugging).
$ oc apply -f vma.yaml
virtualmachine.kubevirt.io/vma created
$
$ virtctl start vma
VM vma was scheduled to start

Actual results:
<BUG>
A VMI is created, but it is stuck on `Scheduling` state.

Expected results:
VMI should complete scheduling and run successfully.

Additional info:
1.
The following failure appears in the virt-launcher pod (the full virt-launcher describe is attached):
Warning FailedCreatePodSandBox 4s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_virt-launcher-vma-p2hvq_yoss-ns_fdc21910-7def-499f-b7b5-f47e945ebfac_0(8ac7f5a3d9da6f351c8a48c577dac4b14c78833efdbd20319a7fa4b1551df90f): error adding pod yoss-ns_virt-launcher-vma-p2hvq to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [yoss-ns/virt-launcher-vma-p2hvq/fdc21910-7def-499f-b7b5-f47e945ebfac:br1test]: error adding container to network "br1test": failed to find plugin "cnv-bridge" in path [/opt/multus/bin /var/lib/cni/bin /usr/libexec/cni]

2.
The issue occurs when attempting to schedule a Fedora 37 and 38 VMs (I didn't check other OS or versions).

3.
The bug was found on an OVN-Kubernets cluster. I didn't check it on OpenshiftSDN.

4.
Same scenario with ovs-bridge on the node passed successfully.

5.
In CNV 4.14.0 - this scenario was last observed to pass successfully with CNV v4.14.0.rhel9-1328 (it may have passed on later versions as well, but this is the last version where we have actual evidence that it passed).
According to VersionExplorer, this bundle includes
multus-dynamic-networks-rhel9 v4.14.0-17
cluster-network-addons-operator-rhel9 v4.14.0-17

6.
Also attached are

journalctl from the node where the VMI was attempted to be scheduled (one for the Fedora 37 and one for the Fedora 38 case)
virt-launcher pod describe output (one for the Fedora 37 and one for the Fedora 38 case)