-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
4.10
-
Critical
-
No
-
2
-
NHE Sprint 247
-
1
-
False
-
Description of problem:
VF getting removed from bond when pod level bonding is used Active-Passive bond on Affected Pod : fms-gateway-cmdty-6ddfb6cd7d-zt48w in vig-dev
Version-Release number of selected component (if applicable):
How reproducible:
occuring on customer env
Steps to Reproduce:
1. Configure Active-Passive bond using 2 VFs 2. Check for cat /proc/net/bonding/net3 from pod 3. We can device getting disconnected
Actual results:
The VF interface being link down & getting renamed [7203165.020151] mlx5_core 0000:98:07.4 net1: Link up [7203165.512579] mlx5_core 0000:98:1e.0 net2: Link up [7203166.047239] mlx5_core 0000:98:1e.0 ens2f1v111: renamed from net2 [7203166.547277] mlx5_core 0000:98:07.4 ens2f0v58: renamed from net1 The below is visible in events KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for pod sandbox k8s_yield-curve-scheduler-7f85fbc7b5-dkl7x_vig-dev_e4055a4b-99f7-4bd4-9c9c-193c81aa1265_0(ce6c2ffd61f63887bf4ebcaf5e8c220dbe30a9fbb49591146b50792efbf801b0): error removing pod vig-dev_yield-curve-scheduler-7f85fbc7b5-dkl7x from CNI network \"multus-cni-network\": plugin type=\"multus\" name=\"multus-cni-network\" failed (delete): delegateDel: error invoking DelegateDel - \"bond\": error in getting result from DelNetwork: Failed to retrieve link objects from configuration file (&{NetConf:{CNIVersion:0.3.1 Name:bond-net1 Type:bond Capabilities:map[] IPAM:{Type:whereabouts} DNS:{Nameservers:[] Domain: Search:[] Options:[]} RawPrevResult:map[] PrevResult:<nil>} Mode:active-backup LinksContNs:true FailOverMac:1 Miimon:100 Links:[map[name:net1] map[name:net2]] MTU:1500}), error: Failed to confirm that link (net1) exists, error: Failed to lookup link name net1, error: Link not found / delegateDel: error invoking DelegateDel - \"sriov\": error in getting result from DelNetwork: failed to get netlink device with name net2: \"Link not found\" / delegateDel: error invoking DelegateDel - \"sriov\": error in getting result from DelNetwork: failed to get netlink device with name net1: \"Link not found\""
Expected results:
Pod functioning normally with net3 bond working
Additional info:
nmstate operator auto-update last night at ~8:40pm central time seems to have triggered this issue on multiple prod clusters.
- depends on
-
RHEL-15275 NetworkManager and Sriov-network-operator coexistance in Openshift
- Closed
- is related to
-
OCPBUGS-18430 VFs are not showing on Nodes for SRIOV
- Closed
-
OCPBUGS-24050 Improve YAML example
- Closed