Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Do
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.9.z
Component/s: Networking / SR-IOV
Labels:
- system-test
- telco

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
5
Severity:
Moderate
Regression:
No

Target Backport Versions:
None
Target Version:

4.9.z
Release Blocker:
None
Sprint:
NHE Sprint 237
sprint_count:
1

Internal Whiteboard:
RH Private Keywords:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

After applying manifests for SR-IOV configuration MCP fails to update

Version-Release number of selected component (if applicable):

4.10.60

How reproducible:

100%

Steps to Reproduce:

1. Deploy baremetal OCP cluster
2. Create MCPs - 1st for regular worker nodes and 2nd for SR-IOV workloads
3. Apply manifests:
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovOperatorConfig
metadata:
  name: default
  namespace: openshift-sriov-network-operator
spec:
  configDaemonNodeSelector:
    node-role.kubernetes.io/sriov: ""
  enableInjector: true
  enableOperatorWebhook: true 

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: "sriovleftdpdkmellanox"
  namespace: openshift-sriov-network-operator
spec:
  resourceName: "sriovleftdpdkmellanox"
  nodeSelector:
    node-role.kubernetes.io/sriov: ""
  mtu: 9000
  numVfs: 4
  nicSelector:
    # consider switching to PCI paths
    pfNames: ['ens4f0', 'ens5f0']
  deviceType: netdevice
  isRdma: True

Actual results:

Node stuck in SchedulingDisabled and MCP is marked as `paused`

Expected results:

SR-IOV is successfully configured and nodes are rebooted

Additional info:

Notes from Sebastian:
---------------------
the problem is now we need to drain also after a reboot when the number of devices is 0
but we are not able to lock the drain as in the reboot time another node already took the lock
so we are in a dead lock

clones

OCPBUGS-14360 [4.10.z] MCP fails to update after SR-IOV drains the node

Closed

depends on

OCPBUGS-14360 [4.10.z] MCP fails to update after SR-IOV drains the node

Closed

Assignee:: William Zhao

Reporter:: Yurii Prokulevych

Need Info From:: None

Contributors:: None

QA Contact:: Zhanqi Zhao

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2023/06/05 4:43 PM

Updated:: 2025/07/26 11:45 AM

Resolved:: 2023/06/16 2:51 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates