Description
I deployed the OpenShift Sandboxed Containers Operator onto a single node bare metal cluster running OCP 4.14.12 (x86_64). The controller manager pod was continually OOMKilled until I manually adjusted the limits in the controller-manager deployment
Steps to reproduce
- Deploy OSC operator on single node bare metal cluster
- Watch controller-manager pod be OOMKilled repeatedly
- Manually change the resource limits in the controller-manager deployment to a higher setting
- controller manager pod is no longer OOMKilled
Expected result
controller manager pod is not OOMKilled when using the default resource limits provided for the controller manager deployment by the operator.
Actual result
controller manager pod continually OOMKilled
Impact
As far as I know, OSC containers cannot be deployed without a functioning controller manager. This error would appear to block the usage of OSC on a single node cluster until the workaround is applied.
Env
OCP 4.14.2
OSC version 1.5.2
Single Node OCP on bare metal with 64 CPU cores and 128GBs of memory
Additional helpful info
The error was resolved when I manually edited the controller manager deployment with the following limits.
```
resources:
limits:
cpu: 999m
memory: 999Mi
requests:
cpu: 999m
memory: 999Mi
```
The numbers above were chosen at random just to see if it would work. I did not test what the minimum value increase would be to avoid the problem. I have attached the controller manager logs to this ticket as well. The logs don't show the OOMKill messages. Only the last message from the pod before it is killed. The last message was always the same in my testing, "Creating sandboxed containers dashboard in the OpenShift console"
- is related to
-
KATA-3000 controller-manager hits OOM at image-creation time
- Closed
- links to
-
RHBA-2024:127642 RHBA: sandboxed-containers bug fix and enhancement update
- mentioned on