-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
4.13.z
-
Critical
-
No
-
Rejected
-
False
-
-
-
-
Description of problem:
The OCP 4.13.0 MNO (3+5) environment are not usable for the other testing. All nodes are printing in the BMC console/terminal "Memory cgroup out of memory" When login to the master node via SSH can't even execute ls -ltr command also kubectl won't work. When rebooting all the nodes from BMC the cluster started work again. This looks like memory leak issue.
Version-Release number of selected component (if applicable):
How reproducible:
Please use the deployment file that I have used and you will be able to reproduce the issue. Try to increase number of replicas to 40 and slowly nodes will start to go down
Steps to Reproduce:
1. Create deployment.yaml filewith below content: ~~~ apiVersion: apps/v1 kind: Deployment metadata: name: stress-ng-test-limit labels: k8s-app: stress-ng-test-limit spec: replicas: 1 selector: matchLabels: app: stress-ng-test-limit template: metadata: name: stress-ng-test-limit labels: app: stress-ng-test-limit spec: serviceAccount: stress-ng-sa containers: - name: stress-ng-test-container-limit image: quay.io/dmoessne/stress-ng-test:0.2 command: [ "/bin/bash", "-c", "--" ] args: [ "stress-ng -c 1 --vm 32 --vm-bytes 100% --vm-method all --madvise 2"] #command: ["sleep", "infinity"] resources: requests: memory: 100Mi limits: memory: "1G" securityContext: seccompProfile: type: RuntimeDefault capabilities: drop: - ALL privileged: true ~~~ 2. [quickcluster@upi-0 ~]$ oc create -f deployment.yaml deployment.apps/stress-ng-test-limit created 3. [quickcluster@upi-0 ~]$ oc get all NAME READY STATUS RESTARTS AGE pod/stress-ng-test-limit-59c59dbf65-j6z84 1/1 Running 0 59sNAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/stress-ng-test-limit 1/1 1 1 59sNAME DESIRED CURRENT READY AGE replicaset.apps/stress-ng-test-limit-59c59dbf65 1 1 1 59s 4. Increase replicas to 40 oc scale --replicas=40 deployment.apps/stress-ng-test-limit 5. After sometime nodes will go down , openshift web console will stop working. Operators will start behaving abnormally.
Actual results:
node goes down
Expected results:
During stress testing , node should not go down.
Additional info:
- is blocked by
-
OCPBUGS-15102 All burstable pods run with the reserved cpu affinity mask when PerformanceProfile is applied
- Closed