Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
4.12
-
None
-
Moderate
-
No
-
False
-
Description
Description of problem:
Failed to delete the Pods with bond-cni and the Pod is stuck in Deleting status:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Killing 161m kubelet Stopping container vru-cudr-dsa-mp Warning FailedPreStopHook 160m kubelet Exec lifecycle hook ([/home/mcm_prestop]) for Container "vru-cudr-dsa-mp" in Pod "sc-cudr-dsa-mp-0-0-1-0_uspp-ft-5(19bb5a66-773a-4fd6-9c6c-12c27826b254)" failed - error: command '/home/mcm_prestop' exited with 137: , message: "send prestop msg success.\r\n" Warning FailedKillPod 160m kubelet error killing pod: failed to "KillPodSandbox" for "19bb5a66-773a-4fd6-9c6c-12c27826b254" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for pod sandbox k8s_sc-cudr-dsa-mp-0-0-1-0_uspp-ft-5_19bb5a66-773a-4fd6-9c6c-12c27826b254_0(2883ce6bac3e986aeeb4ad167f0878230c89ce422c161d3419d11b3c7d4df59b): error removing pod uspp-ft-5_sc-cudr-dsa-mp-0-0-1-0 from CNI network \"multus-cni-network\": plugin type=\"multus\" name=\"multus-cni-network\" failed (delete): delegateDel: error invoking DelegateDel - \"bond\": error in getting result from DelNetwork: Failed to retrieve link objects from configuration file (&{NetConf:{CNIVersion:0.3.1 Name:uspp-ft-5-bond-net-sig-kernel Type:bond Capabilities:map[] IPAM:{Type:whereabouts} DNS:{Nameservers:[] Domain: Search:[] Options:[]} RawPrevResult:map[] PrevResult:<nil>} Mode:active-backup LinksContNs:true FailOverMac:1 Miimon:100 Links:[map[name:svc-sigk-left0] map[name:svc-sigk-left1]] MTU:1800}), error: Failed to confirm that link (svc-sigk-left0) exists, error: Failed to lookup link name svc-sigk-left0, error: Link not found / delegateDel: error invoking DelegateDel - \"sriov\": error in getting result from DelNetwork: failed to get netlink device with name svc-sigk-left1: \"Link not found\" / delegateDel: error invoking DelegateDel - \"sriov\": error in getting result from DelNetwork: failed to get netlink device with name svc-sigk-left0: \"Link not found\"" Warning FailedKillPod 59s (x557 over 160m) kubelet error killing pod: failed to "KillPodSandbox" for "19bb5a66-773a-4fd6-9c6c-12c27826b254" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for pod sandbox k8s_sc-cudr-dsa-mp-0-0-1-0_uspp-ft-5_19bb5a66-773a-4fd6-9c6c-12c27826b254_0(2883ce6bac3e986aeeb4ad167f0878230c89ce422c161d3419d11b3c7d4df59b): error removing pod uspp-ft-5_sc-cudr-dsa-mp-0-0-1-0 from CNI network \"multus-cni-network\": plugin type=\"multus\" name=\"multus-cni-network\" failed (delete): delegateDel: error invoking DelegateDel - \"bond\": error in getting result from DelNetwork: Failed to retrieve link objects from configuration file (&{NetConf:{CNIVersion:0.3.1 Name:uspp-ft-5-bond-net-sig-kernel Type:bond Capabilities:map[] IPAM:{Type:whereabouts} DNS:{Nameservers:[] Domain: Search:[] Options:[]} RawPrevResult:map[] PrevResult:<nil>} Mode:active-backup LinksContNs:true FailOverMac:1 Miimon:100 Links:[map[name:svc-sigk-left0] map[name:svc-sigk-left1]] MTU:1800}), error: Failed to confirm that link (svc-sigk-left0) exists, error: Failed to lookup link name svc-sigk-left0, error: Link not found"
"Failed to confirm that link" comes from here: https://github.com/openshift/bond-cni/blob/release-4.12/bond/bond.go#L95-L96
_, ok := err.(netlink.LinkNotFoundError) if !ok || !isDel || !bondConf.LinksContNs { return nil, fmt.Errorf("Failed to confirm that link (%+v) exists, error: %+v", linkName, err) } } else {
As we can see in the above error message, the error was "Link not found", so
err.(netlink.LinkNotFoundError) should return ok. Also this should be Del cmd so isDel should be true. And net-attach-def, the "linksInContainer" has been set to true:
spec: config: '{ "type": "bond", "cniVersion": "0.3.1", "name": "uspp-ft-5-bond-net-sig-kernel", "mode": "active-backup", "failOverMac": 1, "linksInContainer": true, "miimon": "100", "mtu": 1800, "links": [ {"name": "svc-sigk-left0"}, {"name": "svc-sigk-left1"} ], "ipam": { "type": "whereabouts", "range": "193.21.10.0/24", "range_end": "193.21.10.253", "range_start": "193.21.10.101", "gateway": "193.21.10.1" } }'
Then this condition "if !ok || !isDel || !bondConf.LinksContNs" should be false and the code shouldn't enter it.
Version-Release number of selected component (if applicable):
4.12.15 sriov-network-operator.v4.12.0-202305101515 Pod using bond-cni with VF as slaves
How reproducible:
Sometimes in customer's site
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info: