Uploaded image for project: 'Openshift sandboxed containers'
  1. Openshift sandboxed containers
  2. KATA-2790

Controller Manager is OOM killed on Single Node Cluster

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Medium Medium
    • OSC 1.6.0
    • OSC 1.5.2
    • Operator
    • None
    • False
    • None
    • False
    • Hide
      .`controller-manager` pod fails with `out of memory` errors

      Previously, when the {osc-operator} was deployed on a single-node, bare-metal cluster running {openshift} 4.14.12, the `controller-manager` pod failed with `out of memory` errors. In the current release, the issue has been fixed by increasing the pod's resources.
      Show
      .`controller-manager` pod fails with `out of memory` errors Previously, when the {osc-operator} was deployed on a single-node, bare-metal cluster running {openshift} 4.14.12, the `controller-manager` pod failed with `out of memory` errors. In the current release, the issue has been fixed by increasing the pod's resources.
    • Bug Fix
    • Done
    • 0
    • 0

      Description

      I deployed the OpenShift Sandboxed Containers Operator onto a single node bare metal cluster running OCP 4.14.12 (x86_64). The controller manager pod was continually OOMKilled until I manually adjusted the limits in the controller-manager deployment

      Steps to reproduce

      1. Deploy OSC operator on single node bare metal cluster
      2. Watch controller-manager pod be OOMKilled repeatedly
      3. Manually change the resource limits in the controller-manager deployment to a higher setting
      4. controller manager pod is no longer OOMKilled

      Expected result

      controller manager pod is not OOMKilled when using the default resource limits provided for the controller manager deployment by the operator.

      Actual result

      controller manager pod continually OOMKilled

      Impact

      As far as I know, OSC containers cannot be deployed without a functioning controller manager. This error would appear to block the usage of OSC on a single node cluster until the workaround is applied.

      Env

      OCP 4.14.2

      OSC version 1.5.2

      Single Node OCP on bare metal with 64 CPU cores and 128GBs of memory

      Additional helpful info

      The error was resolved when I manually edited the controller manager deployment with the following limits.

      ```

          resources:
            limits:
              cpu: 999m
              memory: 999Mi
            requests:
              cpu: 999m
              memory: 999Mi

      ```

      The numbers above were chosen at random just to see if it would work. I did not test what the minimum value increase would be to avoid the problem. I have attached the controller manager logs to this ticket as well. The logs don't show the OOMKill messages. Only the last message from the pod before it is killed. The last message was always the same in my testing, "Creating sandboxed containers dashboard in the OpenShift console"

      manager.logs

              cmeadors@redhat.com Cameron Meadors
              dan5179 Dan Clark
              Dan Clark
              Avital Pinnick Avital Pinnick
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: