-
Bug
-
Resolution: Won't Do
-
Critical
-
None
-
4.9.z
-
Moderate
-
No
-
5
-
NHE Sprint 237
-
1
-
False
-
-
-
Description of problem:
After applying manifests for SR-IOV configuration MCP fails to update
Version-Release number of selected component (if applicable):
4.10.60
How reproducible:
100%
Steps to Reproduce:
1. Deploy baremetal OCP cluster 2. Create MCPs - 1st for regular worker nodes and 2nd for SR-IOV workloads 3. Apply manifests: --- apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: name: default namespace: openshift-sriov-network-operator spec: configDaemonNodeSelector: node-role.kubernetes.io/sriov: "" enableInjector: true enableOperatorWebhook: true --- apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: "sriovleftdpdkmellanox" namespace: openshift-sriov-network-operator spec: resourceName: "sriovleftdpdkmellanox" nodeSelector: node-role.kubernetes.io/sriov: "" mtu: 9000 numVfs: 4 nicSelector: # consider switching to PCI paths pfNames: ['ens4f0', 'ens5f0'] deviceType: netdevice isRdma: True
Actual results:
Node stuck in SchedulingDisabled and MCP is marked as `paused`
Expected results:
SR-IOV is successfully configured and nodes are rebooted
Additional info:
Notes from Sebastian: --------------------- the problem is now we need to drain also after a reboot when the number of devices is 0 but we are not able to lock the drain as in the reboot time another node already took the lock so we are in a dead lock
- clones
-
OCPBUGS-14360 [4.10.z] MCP fails to update after SR-IOV drains the node
- Closed
- depends on
-
OCPBUGS-14360 [4.10.z] MCP fails to update after SR-IOV drains the node
- Closed