-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.14
-
None
-
None
-
3
-
NHE Sprint 261, NHE Sprint 262
-
2
-
False
-
-
Description of problem:
As a workaround on the https://issues.redhat.com/browse/OCPBUGS-14560 Pods are having Tolerations to cleanup pods that are hung or in Not Ready State. The issue is that the pods with UnexpectedAdmissionError are not getting cleaned up. Although, manual cleanup is possible, customer are thinking that it might be an issue with production environment. Toleration in the pods: Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Version-Release number of selected component (if applicable):4.14.35
How reproducible:
100% of times after doing system reboot, graceful shutdown
Steps to Reproduce:
1. Deploy DU application with several SRIOV pods 2. Reboot the system (https://docs.openshift.com/container-platform/4.12/backup_and_restore/graceful-cluster-shutdown.html) or reboot via redfish 3. Check SRIOV pods after reboot
Actual results:
$ oc get po -o wide NAME READY STATUS dpdk-test-75487bddcc-d9zkp 0/1 UnexpectedAdmissionError dpdk-test-75487bddcc-h2847 1/1 Running dpdk-test-75487bddcc-rv2jw 1/1 Running dpdk-test-75487bddcc-wkppc 0/1 UnexpectedAdmissionError
Expected results:
oc get pods --> Pods should run normally
Additional info: