Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-27818

High memory usage by Kube APIServer on HostedCluster upgrades

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 4.15.0
    • 4.14, 4.15, 4.16
    • HyperShift
    • Important
    • No
    • Proposed
    • False
    • Hide

      None

      Show
      None
    • Hide
      Cause - Kube API Server pods are dying because of OOM when a Hosted Cluster, that has a large number of worker nodes, is being upgraded.
      Consequence - ROSA HCP clusters can’t be updated if the total number of worker nodes is over 51.
      Fix - Exposed GOMEMLIMIT and GOGC for kube-apiserver via annotations in HostedCluster.
      Result - ROSA HCP clusters with worker nodes over 51 can be upgraded now.
      Show
      Cause - Kube API Server pods are dying because of OOM when a Hosted Cluster, that has a large number of worker nodes, is being upgraded. Consequence - ROSA HCP clusters can’t be updated if the total number of worker nodes is over 51. Fix - Exposed GOMEMLIMIT and GOGC for kube-apiserver via annotations in HostedCluster. Result - ROSA HCP clusters with worker nodes over 51 can be upgraded now.

      This is a clone of issue OCPBUGS-27817. The following is the description of the original issue:

      Description of problem:

      When performing upgrades on ROSA HCP clusters with a large number of worker nodes (> 51), the Kube APIServer pods of the cluster use up memory exceeding the capacity of their nodes, resulting in OOMKills.     

      Version-Release number of selected component (if applicable):

         4.14, 4.15

      How reproducible:

          always

      Steps to Reproduce:

          1. Create ROSA HCP Cluster
          2. Add 100 workers to Cluster
          3. Upgrade the cluster
          

      Actual results:

          Kube APIServer pods are OOMKilled

      Expected results:

          Upgrade completes successfully

      Additional info:

          

            agarcial@redhat.com Alberto Garcia Lamela
            openshift-crt-jira-prow OpenShift Prow Bot
            He Liu He Liu
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: