Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.12.z
Component/s: Networking / SR-IOV
Labels:

Severity:
Important
Regression:
No
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Customer Impact:

Customer Escalated
Internal Whiteboard:
Latest Status Summary:

Hide
7/18: PR for ~~OCPBUGS-14605~~ has been merged u/s. waiting for backports and then will close this bug

Show
7/18: PR for OCPBUGS-14605 has been merged u/s. waiting for backports and then will close this bug
RH Private Keywords:
Target Version:

4.12.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:
PX Impact Range:
PX Review Complete:
PX Technical Impact:

Description of problem:


There are several problems on SRIOV pods after doing a system reboot on a SNO cluster.

SRIOV pods appear as duplicated. The problem seems to be that SRIOV pods that exist before the reboot are not deleted properly.

oc get pods:
...
test32-deployment-6b5c896c96-6r9kf     6/6     Running                    0          19m
test32-deployment-6b5c896c96-m5qzv     0/6     UnexpectedAdmissionError   0          32m
...

Version-Release number of selected component (if applicable):


New bug seen in 4.12.20. It hasn't been seen before.

How reproducible:


100% of times after doing system reboot, graceful shutdown or via redfish.

Steps to Reproduce:

1. Deploy DU application with several SRIOV pods
2. Reboot the system (https://docs.openshift.com/container-platform/4.12/backup_and_restore/graceful-cluster-shutdown.html) or reboot via redfish
3. Check SRIOV pods after reboot

Actual results:


Some SRIOV pods appear as duplicated and with errors:

oc get pods:
...
test32-deployment-6b5c896c96-6r9kf     6/6     Running                    0          19m
test32-deployment-6b5c896c96-m5qzv     0/6     UnexpectedAdmissionError   0          32m
...

Expected results:


oc get pods --> pods running normally

Additional info:


System impact: Manual cleanup of SRIOV pods of the DU is required after any kind of reboot.


Old pod32 describe:

oc describe pod/test32-deployment-6b5c896c96-m5qzv

  Warning  UnexpectedAdmissionError  25m   kubelet            Allocate failed due to no healthy devices present; cannot allocate unhealthy devices openshift.io/pci_sriov_net_llscu, which is unexpected

Assignee:: Balazs Nemeth

Reporter:: Rodrigo Lopez Manrique (Inactive)

QA Contact:: Zhanqi Zhao

Votes:: 1 Vote for this issue

Watchers:: 14 Start watching this issue

Created:: 2023/06/05 8:32 AM

Updated:: 2024/05/02 6:57 PM

Resolved:: 2023/07/26 9:32 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates