-
Bug
-
Resolution: Done
-
Undefined
-
None
-
4.12.z
Description of problem:
There are several problems on SRIOV pods after doing a system reboot on a SNO cluster. SRIOV pods appear as duplicated. The problem seems to be that SRIOV pods that exist before the reboot are not deleted properly. oc get pods: ... test32-deployment-6b5c896c96-6r9kf 6/6 Running 0 19m test32-deployment-6b5c896c96-m5qzv 0/6 UnexpectedAdmissionError 0 32m ...
Version-Release number of selected component (if applicable):
New bug seen in 4.12.20. It hasn't been seen before.
How reproducible:
100% of times after doing system reboot, graceful shutdown or via redfish.
Steps to Reproduce:
1. Deploy DU application with several SRIOV pods 2. Reboot the system (https://docs.openshift.com/container-platform/4.12/backup_and_restore/graceful-cluster-shutdown.html) or reboot via redfish 3. Check SRIOV pods after reboot
Actual results:
Some SRIOV pods appear as duplicated and with errors: oc get pods: ... test32-deployment-6b5c896c96-6r9kf 6/6 Running 0 19m test32-deployment-6b5c896c96-m5qzv 0/6 UnexpectedAdmissionError 0 32m ...
Expected results:
oc get pods --> pods running normally
Additional info:
System impact: Manual cleanup of SRIOV pods of the DU is required after any kind of reboot. Old pod32 describe: oc describe pod/test32-deployment-6b5c896c96-m5qzv Warning UnexpectedAdmissionError 25m kubelet Allocate failed due to no healthy devices present; cannot allocate unhealthy devices openshift.io/pci_sriov_net_llscu, which is unexpected