Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-74500

Multus fails to create interface during high pod churn: rejected by rate limit

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.20
    • Networking / multus
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      When churning (deleting & re-creating) a high number of pods, I can see the following error:

      Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_curl-1-8-75c5467d87-clqkq_node-density-cni-0_ace5d1a2-c39d-4e7c-94ec-b8d164fd929b_0(d2de5f1128dc0db112d21643b42339afbbf0f8       ffde5da07c32acbc0c4e0b4543): error adding pod node-density-cni-0_curl-1-8-75c5467d87-clqkq to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: 'ContainerID:"d2de5f1128dc0db112d21643b42339afbbf0f8ffde5da07c32acb       c0c4e0b4543" Netns:"/var/run/netns/782e7182-a3da-4c76-b9a0-a5d3577f22e1" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=node-density-cni-0;K8S_POD_NAME=curl-1-8-75c5467d87-clqkq;K8S_POD_INFRA_CONTAINER_ID=d2de5f1128dc0db112d21643b42339afbbf0f8ffde5da07c32acbc0c4e0b4543;K8S_POD_UID=ace5d1a2-c39d-4e7c-94ec       -b8d164fd929b" Path:"" ERRORED: error configuring pod [node-density-cni-0/curl-1-8-75c5467d87-clqkq] networking: Multus: [node-density-cni-0/curl-1-8-75c5467d87-clqkq/ace5d1a2-c39d-4e7c-94ec-b8d164fd929b]: error loading k8s delegates k8s args: TryLoadPodDelegates: error in getting k8s network for pod: GetNetwor       kDelegates: failed getting the delegate: getKubernetesDelegate: failed to get a ResourceClient instance: getKubeletClient: error getting pod resources from client: getPodResources: failed to list pod resources, &{0xc000eda408}.Get(_) = _, rpc error: code = ResourceExhausted desc = rejected by rate limit 

       

      Version-Release number of selected component (if applicable):

          4.20

      How reproducible:

          Easily using kube-burner and churning > 300 pods with qps > 10.

      Steps to Reproduce:

          1. kube-burner-ocp node-density-cni --extract 
          2. Modify worklaod to add additional interfaces to pods
          3. kube-burner-ocp node-density-cni --qps 20 --churn-cycles 1 --churn-duration 1h --churn-delay 2m --pods-per-node=200

      Actual results:

          The error above makes the pod creation fail, all existing interfaces are deleted and pod creation is re-scheduled, delaying the overall churn duration.

      Expected results:

      I understand the rate-limit is there for a reason, but IIUC we call this API for each interface and pod. Maybe there is a way to optimize it? cache the results somehow? Or maybe there is a way to handle the error within multus (wait & retry?) that improves resiliency during these periods of high demand.

       

      Opening this ticket to explore possible improvements.

       

      Additional Information:

      The rate limit seems to be hardcoded in kubelet client:
      https://pkg.go.dev/k8s.io/kubernetes/pkg/kubelet/apis/podresources/grpc

       

              bpickard@redhat.com Ben Pickard
              amorenoz@redhat.com Adrian Moreno
              Weibin Liang Weibin Liang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: