-
Bug
-
Resolution: Not a Bug
-
Undefined
-
None
-
4.14.0
-
None
-
Critical
-
No
-
Proposed
-
False
-
Description of problem:
Building an openshift-apiserver image from the master:HEAD (e321623b22888edcf608b03d8aae9d2d9a38c799) after bumping openshift/api to the latest release-4.14:HEAD (v0.0.0-20230714214528-de6ad7979b00) leads to the newly deployed openshift-apiserver instances getting OOM.
Version-Release number of selected component (if applicable):
How reproducible:
Always
Steps to Reproduce:
1. Deploy 4.14 with openshift-installer (openshift-install-linux-4.14.0-0.nightly-2023-06-29-065352.tar.gz) 2. Disable CVO 3. Update the openshift-apiserver-operator deployment to point the IMAGE env into the newly built image with the openshift-apiserver bump 4. Wait for 5-10 minutes
Actual results:
After openshift-apiserver eating all the available memory, the corresponding master node stops reporting its status (through kubelet). The master node eventually gets reported as NotReady. After some time a second master node experience the same scenario. Eventually putting the control plane down.
Expected results:
openshift-apiserver does not consume all the available memory after getting re-deployed with the latest openshift/api@master:HEAD dep.
Additional info:
- Masters in this order: ip-10-0-134-26.ec2.internal, ip-10-0-148-75.ec2.internal, ip-10-0-166-146.ec2.internal
None of the openshift-apiserver pods specify resource limit for openshift-apiserver container.
Master's memory before deploying the new image (via free -mh):total used free shared buff/cache available Mem: 15Gi 5.2Gi 898Mi 68Mi 9.7Gi 10Gi total used free shared buff/cache available Mem: 15Gi 6.5Gi 276Mi 70Mi 9.0Gi 8.8Gi total used free shared buff/cache available Mem: 15Gi 7.1Gi 193Mi 73Mi 8.5Gi 8.3Gi
Running `ps -e -o pid,vsz,comm= | sort -n -k 2 | tail -4` on all master nodes before:
37443 1937908 kube-apiserver 2136 2001464 kubelet 23612 2932608 python3 35376 10715276 etcd 2111 2464088 crio 54864 2688756 kube-apiserver 33492 4966852 machine-control 63170 10715404 etcd 37992 2005256 kube-apiserver 2112 2257196 crio 28008 2932600 python3 42572 10647176 etcd
After:
total used free shared buff/cache available Mem: 15Gi 5.3Gi 566Mi 68Mi 9Gi 10Gi total used free shared buff/cache available Mem: 15Gi 6.7Gi 362Mi 70Mi 8.7Gi 8.6Gi total used free shared buff/cache available Mem: 15Gi 13Gi 155Mi 73Mi 2.1Gi 1.8Gi
2136 2001720 kubelet 37443 2005064 kube-apiserver 23612 2932608 python3 35376 10715532 etcd 2111 2464344 crio 54864 2689524 kube-apiserver 33492 4966852 machine-control 63170 10715404 etcd 2112 2257196 crio 28008 2932600 python3 42572 10647240 etcd 90253 820723400 openshift-apise
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES apiserver-79f98448c9-9ztm6 2/2 Running 0 5m58s 10.129.0.74 ip-10-0-148-75.ec2.internal <none> <none> apiserver-79f98448c9-rd9hp 2/2 Running 0 4m27s 10.128.0.37 ip-10-0-134-26.ec2.internal <none> <none> apiserver-79f98448c9-xf7rl 2/2 Running 0 7m28s 10.130.0.61 ip-10-0-166-146.ec2.internal <none> <none>
- blocks
-
WRKLDS-728 Capabilities: enable/disable API based on capabilities (OAS + OASO)
- Closed
- links to