Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-16264

openshift-apiserver OOM after openshift/api:release-4.14 bump

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Undefined
    • None
    • 4.14.0
    • openshift-apiserver
    • None
    • Critical
    • No
    • Proposed
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      Building an openshift-apiserver image from the master:HEAD (e321623b22888edcf608b03d8aae9d2d9a38c799) after bumping openshift/api to the latest release-4.14:HEAD (v0.0.0-20230714214528-de6ad7979b00) leads to the newly deployed openshift-apiserver instances getting OOM.
      

      Version-Release number of selected component (if applicable):

      
      

      How reproducible:

      Always
      

      Steps to Reproduce:

      1. Deploy 4.14 with openshift-installer (openshift-install-linux-4.14.0-0.nightly-2023-06-29-065352.tar.gz)
      2. Disable CVO
      3. Update the openshift-apiserver-operator deployment to point the IMAGE env into the newly built image with the openshift-apiserver bump
      4. Wait for 5-10 minutes
      

      Actual results:

      After openshift-apiserver eating all the available memory, the corresponding master node stops reporting its status (through kubelet). The master node eventually gets reported as NotReady. After some time a second master node experience the same scenario. Eventually putting the control plane down.
      
      

      Expected results:

      openshift-apiserver does not consume all the available memory after getting re-deployed with the latest openshift/api@master:HEAD dep.
      
      

      Additional info:

      • Masters in this order: ip-10-0-134-26.ec2.internal, ip-10-0-148-75.ec2.internal, ip-10-0-166-146.ec2.internal
        None of the openshift-apiserver pods specify resource limit for openshift-apiserver container.
        Master's memory before deploying the new image (via free -mh):
                       total        used        free	  shared  buff/cache   available
        Mem:            15Gi	   5.2Gi       898Mi        68Mi       9.7Gi        10Gi
                       total        used        free	  shared  buff/cache   available
        Mem:            15Gi	   6.5Gi       276Mi        70Mi       9.0Gi	   8.8Gi
                       total        used        free	  shared  buff/cache   available
        Mem:            15Gi	   7.1Gi       193Mi        73Mi       8.5Gi	   8.3Gi
        

      Running `ps -e -o pid,vsz,comm= | sort -n -k 2 | tail -4` on all master nodes before:

        37443 1937908 kube-apiserver
         2136 2001464 kubelet
        23612 2932608 python3
        35376 10715276 etcd
      
         2111 2464088 crio
        54864 2688756 kube-apiserver
        33492 4966852 machine-control
        63170 10715404 etcd
      
        37992 2005256 kube-apiserver
         2112 2257196 crio
        28008 2932600 python3
        42572 10647176 etcd
      

      After:

                     total        used        free	  shared  buff/cache   available
      Mem:            15Gi	   5.3Gi       566Mi        68Mi         9Gi        10Gi
                     total        used        free	  shared  buff/cache   available
      Mem:            15Gi	   6.7Gi       362Mi        70Mi       8.7Gi	   8.6Gi
                     total        used        free	  shared  buff/cache   available
      Mem:            15Gi	    13Gi       155Mi        73Mi       2.1Gi	   1.8Gi
      
         2136 2001720 kubelet
        37443 2005064 kube-apiserver
        23612 2932608 python3
        35376 10715532 etcd
      
         2111 2464344 crio
        54864 2689524 kube-apiserver
        33492 4966852 machine-control
        63170 10715404 etcd
      
         2112 2257196 crio
        28008 2932600 python3
        42572 10647240 etcd
        90253 820723400 openshift-apise
      
      NAME                         READY   STATUS    RESTARTS   AGE     IP            NODE                           NOMINATED NODE   READINESS GATES
      apiserver-79f98448c9-9ztm6   2/2     Running   0          5m58s   10.129.0.74   ip-10-0-148-75.ec2.internal    <none>           <none>
      apiserver-79f98448c9-rd9hp   2/2     Running   0          4m27s   10.128.0.37   ip-10-0-134-26.ec2.internal    <none>           <none>
      apiserver-79f98448c9-xf7rl   2/2     Running   0          7m28s   10.130.0.61   ip-10-0-166-146.ec2.internal   <none>           <none>
      

      Attachments

        Issue Links

          Activity

            People

              vrutkovs@redhat.com Vadim Rutkovsky
              jchaloup@redhat.com Jan Chaloupka
              Rahul Gangwar Rahul Gangwar
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: