Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-32991

multus-admission-controller stuck in CrashLoopBackOff when egress IPs are created at scale [4.17]

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • None
    • 4.14.z
    • Networking / multus
    • None
    • Important
    • No
    • False
    • Hide

      None

      Show
      None
    • Hide
      *Cause*: What actions or circumstances cause this bug to present.
      *Consequence*: What happens when the bug presents.
      *Fix*: What was done to fix the bug.
      *Result*: Bug doesn’t present anymore.
      Show
      *Cause*: What actions or circumstances cause this bug to present. *Consequence*: What happens when the bug presents. *Fix*: What was done to fix the bug. *Result*: Bug doesn’t present anymore.
    • Release Note Not Required
    • In Progress

      Description of problem:

      Perf & scale team is running scale tests to to find out maximum supported egress ips and come across this issue. When we have 55339 egress ip objects (each egress ip object with one egress ip address) in 118 worker node baremetal cluster, multus-admission-controller pod is stuck in CrashLoopBackOff state.
      
      "oc describe pod" command output is copied here http://storage.scalelab.redhat.com/anilvenkata/multus-admission/multus-admission-controller-84b896c8-kmvdk.describe 
      
      "oc describe pod" shows that the names of all 55339 egress ips are passed to container's exec command 
      #cat multus-admission-controller-84b896c8-kmvdk.describe  | grep ignore-namespaces | tr ',' '\n' | grep -c egressip
      55339
      
      and exec command is failing as this argument list is too long.
      # oc logs  -n openshift-multus multus-admission-controller-84b896c8-kmvdk
      Defaulted container "multus-admission-controller" out of: multus-admission-controller, kube-rbac-proxy
      exec /bin/bash: argument list too long
      
      # oc get co network
      NAME      VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      network   4.14.16   True        True          False      35d     Deployment "/openshift-multus/multus-admission-controller" update is rolling out (1 out of 3 updated)
      
      # oc describe pod -n openshift-multus multus-admission-controller-84b896c8-kmvdk > multus-admission-controller-84b896c8-kmvdk.describe
       
      # oc get pods -n openshift-multus  | grep multus-admission-controller
      multus-admission-controller-6c58c66ff9-5x9hn   2/2     Running            0                35d
      multus-admission-controller-6c58c66ff9-zv9pd   2/2     Running            0                35d
      multus-admission-controller-84b896c8-kmvdk     1/2     CrashLoopBackOff   26 (2m56s ago)   110m
      
      As this environment has 55338 namespaces (each namespace with 1 pod and 1 eip object), it will hard to capture must gather.  

      Version-Release number of selected component (if applicable):

          4.14.16

      How reproducible:

          always

      Steps to Reproduce:

          1. use kube-burner to create 55339 egress ip obejct, each object with one egress ip address. 
          2. We will see multus-admission-controller pod stuck in CrashLoopBackOff     
          

      Actual results:

          

      Expected results:

          

      Additional info:

          

              tohayash@redhat.com Tomofumi Hayashi
              vkommadi@redhat.com VENKATA ANIL kumar KOMMADDI
              Sachin Ninganure Sachin Ninganure
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: