Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-2732

Long-term Fix for Operator OOM killed on clusters with lots of namespaces

XMLWordPrintable

    • MODH Sprint 37, MODH Sprint 1.7

      Description of problem:

      On a cluster with lots of namespaces (over 500?), the rhods-operator pod memory footprint grows past it's 1GB RAM limit and is killed.  This sends the operator into a neverending crashloop rendering it non-functional.

      Prerequisites (if any, like setup, operators/versions):

       

      Steps to Reproduce

      1. Install RHODS
      2. Create 1000 namespaces

      Actual results:

      Operator memory grows over 1GB and starts crashlooping.

      Expected results:

      Operator memory stays manageable and doesn't grow.  It continues to function normally

      Reproducibility (Always/Intermittent/Only Once):

      Always

      Build Details:

      v1.3.0-6

      Workaround:

      On manual rhods installation (olminstall) clusters, raise the operator memory limit to 8GB in the CSV.  Does not work on addon since it is overwritten by Hive.

      Additional info:

      We cannot migrate from the current rhods-sandbox to the production sandbox until this is fixed as they have up to 1500 users and 3000 namespaces per cluster.

      Operator Memory
      Baseline
           Max rhods-operator Memory Usage: 422.98 MB
      100 Users / 200 Namespaces
          Max rhods-operator Memory Usage: 767.34 MB
      500 Users / 1000 Namespaces
           Max rhods-operator Memory Usage: 1733.28 MB

          

              cchase@redhat.com Chris Chase
              cchase@redhat.com Chris Chase
              Tarun Kumar Tarun Kumar
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: