-
Bug
-
Resolution: Done-Errata
-
Undefined
-
None
-
4.14.z
-
None
Description of problem:
Perf & scale team is running scale tests to to find out maximum supported egress ips and come across this issue. When we have 55339 egress ip objects (each egress ip object with one egress ip address) in 118 worker node baremetal cluster, multus-admission-controller pod is stuck in CrashLoopBackOff state. "oc describe pod" command output is copied here http://storage.scalelab.redhat.com/anilvenkata/multus-admission/multus-admission-controller-84b896c8-kmvdk.describe "oc describe pod" shows that the names of all 55339 egress ips are passed to container's exec command #cat multus-admission-controller-84b896c8-kmvdk.describe | grep ignore-namespaces | tr ',' '\n' | grep -c egressip 55339 and exec command is failing as this argument list is too long. # oc logs -n openshift-multus multus-admission-controller-84b896c8-kmvdk Defaulted container "multus-admission-controller" out of: multus-admission-controller, kube-rbac-proxy exec /bin/bash: argument list too long # oc get co network NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE network 4.14.16 True True False 35d Deployment "/openshift-multus/multus-admission-controller" update is rolling out (1 out of 3 updated) # oc describe pod -n openshift-multus multus-admission-controller-84b896c8-kmvdk > multus-admission-controller-84b896c8-kmvdk.describe # oc get pods -n openshift-multus | grep multus-admission-controller multus-admission-controller-6c58c66ff9-5x9hn 2/2 Running 0 35d multus-admission-controller-6c58c66ff9-zv9pd 2/2 Running 0 35d multus-admission-controller-84b896c8-kmvdk 1/2 CrashLoopBackOff 26 (2m56s ago) 110m As this environment has 55338 namespaces (each namespace with 1 pod and 1 eip object), it will hard to capture must gather.
Version-Release number of selected component (if applicable):
4.14.16
How reproducible:
always
Steps to Reproduce:
1. use kube-burner to create 55339 egress ip obejct, each object with one egress ip address. 2. We will see multus-admission-controller pod stuck in CrashLoopBackOff
Actual results:
Expected results:
Additional info:
- clones
-
OCPBUGS-32989 multus-admission-controller stuck in CrashLoopBackOff when egress IPs are created at scale [4.15]
- Closed
- is cloned by
-
OCPBUGS-34214 multus-admission-controller stuck in CrashLoopBackOff when egress IPs are created at scale [4.16]
- Closed
- is depended on by
-
OCPBUGS-34214 multus-admission-controller stuck in CrashLoopBackOff when egress IPs are created at scale [4.16]
- Closed
- links to
-
RHEA-2024:3718 OpenShift Container Platform 4.17.z bug fix update