Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-3191

Dashboard is causing high memory usage on openshift-kube-apiserver

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • RHODS_1.11.0_GA
    • RHODS_1.6.0_GA
    • UI
    • RHODS 1.11, RHODS 1.12
    • Important

      Description of problem:

      On clusters with very high resoruces, ODH Dashboard is causing high memory usage on openshift-kube-apiserver.  This is creating memory pressure on OpenShift master nodes where those pods reside, rendering the cluster unreachable if it uses all available memory on the node.

      Performance test was the default toolchain-e2e test for the sandbox. This will create lots of users, namespaces, and users. 

      https://github.com/codeready-toolchain/toolchain-e2e/tree/master/setup

      sum(container_memory_usage_bytes{namespace="openshift-kube-apiserver", pod=~"kube-apiserver-.*"}) amounted to 360GB of ram and individual pods were over 60GB.

      https://docs.google.com/spreadsheets/d/1RKt9jtTtv4Ft3ZlVk4sszC-BrF_0dbC46_FNHiLTaI0/edit#gid=0

      Prerequisites (if any, like setup, operators/versions):

      Steps to Reproduce (

      1. create cluster with master nodes m5.8xlarge (https://github.com/codeready-toolchain/toolchain-e2e/tree/master/setup#prereqs)
      2. install RHODS
      3. Set up the sandbox operators (https://github.com/codeready-toolchain/toolchain-e2e/tree/master/setup#dev-sandbox-setup-1)
      4. run the tests (https://github.com/codeready-toolchain/toolchain-e2e/tree/master/setup#provisioning-test-users-and-capturing-metrics)
        for 1 user, and for 2000 users and compare results
        example: 
        go run setup/main.go -users 2000 --default 2000 --custom 0 --username "user${RANDOM_NAME}" --workloads redhat-ods-operator:rhods-operator --workloads redhat-ods-applications:rhods-dashboard --workloads redhat-ods-operator:cloud-resource-operator --workloads redhat-ods-monitoring:blackbox-exporter --workloads redhat-ods
        monitoring:grafana --workloads redhat-ods-monitoring:prometheus
      5. view the results

      Actual results:

      Resulting memory usage for Average openshift-kube-apiserver 
      and openshift-kube-apiserver are very high (300+ GB)

      Average openshift-kube-apiserver: 418536.23 MB
      Max openshift-kube-apiserver: 418536.23 MB

      Usage with same resources on cluster before and after deploying dashboard:

      Expected results:

      Resulting memory usage for Average openshift-kube-apiserver 
      and openshift-kube-apiserver should stay under 100GB

      Average openshift-kube-apiserver: 55374.12 MB
      Max openshift-kube-apiserver: 89703.79 MB

      Reproducibility (Always/Intermittent/Only Once):

      Build Details: v1.6.0-8 and 1.7.0-5

      Workaround: None

      Additional info:

              jephilli@redhat.com Jeffrey Phillips
              cchase@redhat.com Chris Chase
              Tarun Kumar Tarun Kumar
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

                Created:
                Updated:
                Resolved: