-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.20
-
None
-
None
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
When churning (deleting & re-creating) a high number of pods, I can see the following error:
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_curl-1-8-75c5467d87-clqkq_node-density-cni-0_ace5d1a2-c39d-4e7c-94ec-b8d164fd929b_0(d2de5f1128dc0db112d21643b42339afbbf0f8 ffde5da07c32acbc0c4e0b4543): error adding pod node-density-cni-0_curl-1-8-75c5467d87-clqkq to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: 'ContainerID:"d2de5f1128dc0db112d21643b42339afbbf0f8ffde5da07c32acb c0c4e0b4543" Netns:"/var/run/netns/782e7182-a3da-4c76-b9a0-a5d3577f22e1" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=node-density-cni-0;K8S_POD_NAME=curl-1-8-75c5467d87-clqkq;K8S_POD_INFRA_CONTAINER_ID=d2de5f1128dc0db112d21643b42339afbbf0f8ffde5da07c32acbc0c4e0b4543;K8S_POD_UID=ace5d1a2-c39d-4e7c-94ec -b8d164fd929b" Path:"" ERRORED: error configuring pod [node-density-cni-0/curl-1-8-75c5467d87-clqkq] networking: Multus: [node-density-cni-0/curl-1-8-75c5467d87-clqkq/ace5d1a2-c39d-4e7c-94ec-b8d164fd929b]: error loading k8s delegates k8s args: TryLoadPodDelegates: error in getting k8s network for pod: GetNetwor kDelegates: failed getting the delegate: getKubernetesDelegate: failed to get a ResourceClient instance: getKubeletClient: error getting pod resources from client: getPodResources: failed to list pod resources, &{0xc000eda408}.Get(_) = _, rpc error: code = ResourceExhausted desc = rejected by rate limit
Version-Release number of selected component (if applicable):
4.20
How reproducible:
Easily using kube-burner and churning > 300 pods with qps > 10.
Steps to Reproduce:
1. kube-burner-ocp node-density-cni --extract
2. Modify worklaod to add additional interfaces to pods
3. kube-burner-ocp node-density-cni --qps 20 --churn-cycles 1 --churn-duration 1h --churn-delay 2m --pods-per-node=200
Actual results:
The error above makes the pod creation fail, all existing interfaces are deleted and pod creation is re-scheduled, delaying the overall churn duration.
Expected results:
I understand the rate-limit is there for a reason, but IIUC we call this API for each interface and pod. Maybe there is a way to optimize it? cache the results somehow? Or maybe there is a way to handle the error within multus (wait & retry?) that improves resiliency during these periods of high demand.
Opening this ticket to explore possible improvements.
Additional Information:
The rate limit seems to be hardcoded in kubelet client:
https://pkg.go.dev/k8s.io/kubernetes/pkg/kubelet/apis/podresources/grpc