-
Bug
-
Resolution: Won't Do
-
Major
-
None
-
4.12.z
-
Important
-
No
-
Rejected
-
False
-
Description of problem:
Initially noticed an issue where all the VFs were shown zero'd out, but we understand this is expected. https://issues.redhat.com/browse/RHEL-3398 has been filed to check with Nvidia on why this is happening. Regardless of that, we are still unable to pass traffic between pods with Hardware Offload. There is an observation that MTU is set improperly. there is a question about link-state. There is also some question around documentation and instructions. YAML files used for config - additional configs attached. sriov-pool-config.yaml : apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkPoolConfig metadata: name: offload-sriovnetworkpoolconfig namespace: openshift-sriov-network-operator spec: ovsHardwareOffloadConfig: name: mcp-offloading mcp-offloading.yaml : apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: mcp-offloading spec: machineConfigSelector: matchExpressions: - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,mcp-offloading]} nodeSelector: matchLabels: node-role.kubernetes.io/mcp-offloading: "" Attached Are The Sriovnetworknodepolicies And Network Attachment Definitions I followed the directions found at [Configuring hardware offloading - Hardware networks | Networking | OpenShift Container Platform 4.12|https://docs.openshift.com/container-platform/4.12/networking/hardware_networks/configuring-hardware-offloading.html] for configuring one of our NICs (Mellanox ConnectX-6 Dx) for hardware offloading. After the configuration is complete I see the VFs for the configured network all have zeroed out MAC addresses as seen below: 10: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether b8:ce:f6:91:b2:38 brd ff:ff:ff:ff:ff:ff vf 0 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 1 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 2 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 3 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 4 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 5 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 6 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 7 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 8 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 9 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 10 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 11 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 12 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 13 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 14 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 15 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 16 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 17 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 18 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 19 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 20 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 21 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 22 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 23 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 24 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 25 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 26 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 27 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 28 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off vf 29 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off The other items of note are that the mtu is expected to be 9000 based on the SriovNetworkNodePolicy for this interface, and the link-state was set to auto rather than disable. Previous to the configuration for hardware offloading, the interface in question had populated MAC addresses and mtu 9000. For example: 11: ens1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether b8:ce:f6:91:b2:39 brd ff:ff:ff:ff:ff:ff vf 0 link/ether b2:d3:6a:8c:92:86 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 1 link/ether f6:b8:24:46:5d:d9 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 2 link/ether 4e:65:6d:b5:20:3d brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 3 link/ether ca:57:a0:0a:b6:d2 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 4 link/ether 82:43:95:e9:31:ea brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 5 link/ether 3a:55:fa:97:94:d4 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 6 link/ether 9e:fa:a4:15:7a:ee brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 7 link/ether 4e:15:b7:14:55:f2 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 8 link/ether ba:5f:6e:d1:f3:35 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust on, query_rss off vf 9 link/ether 36:af:99:b9:f8:f0 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust on, query_rss off vf 10 link/ether c2:45:c2:c9:91:02 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 11 link/ether b2:4e:7b:93:ac:17 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 12 link/ether 0e:a9:88:8a:d0:06 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 13 link/ether b2:0a:7a:f8:12:f7 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 14 link/ether 4e:ae:72:94:76:e5 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 15 link/ether 32:7a:18:28:bb:61 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 16 link/ether f6:0d:78:ba:db:25 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 17 link/ether ba:c1:97:e8:4c:fd brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 18 link/ether aa:b5:16:54:88:ea brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 19 link/ether fe:2e:ee:82:83:ec brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 20 link/ether 2a:35:c4:d6:11:83 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust on, query_rss off vf 21 link/ether 9a:c7:50:6a:3e:0d brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 22 link/ether 82:b9:1f:4e:c1:e3 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 23 link/ether 96:48:1d:12:bc:70 brd ff:ff:ff:ff:ff:ff, vlan 181, spoof checking off, link-state auto, trust on, query_rss off vf 24 link/ether 86:74:76:dd:fb:59 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 25 link/ether 52:03:44:f1:64:15 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 26 link/ether 02:8c:bd:9b:4f:af brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 27 link/ether 1a:09:ce:87:13:2c brd ff:ff:ff:ff:ff:ff, vlan 3216, spoof checking off, link-state auto, trust on, query_rss off vf 28 link/ether d6:e1:4f:33:bc:3b brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust on, query_rss off vf 29 link/ether 62:03:53:f6:2f:c0 brd ff:ff:ff:ff:ff:ff, vlan 3611, spoof checking off, link-state auto, trust on, query_rss off I have been able to deploy pods using the network interface, but the attached interface on the pod side is showing NO-CARRIER with a DOWN state, which is not expected as far as I know. See below: 101: net1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether e2:4e:9c:1a:ad:0e brd ff:ff:ff:ff:ff:ff inet 10.1.11.2/24 brd 10.1.11.255 scope global net1 valid_lft forever preferred_lft forever The cluster is running OCP 4.12.30 and the driver and firmware information is below: root@greig-w1-ens1f0-mlx5-client:/# ethtool -i net1 driver: mlx5_core version: 4.18.0-372.69.1.el8_6.x86_64 firmware-version: 22.35.2000 (MT_0000000359) expansion-rom-version: bus-info: 0000:3b:00.7 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes NIC information: [core@worker1 ~]$ lspci -nn -v -s 0000:3b:00.0 3b:00.0 Ethernet controller [0200]: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] [15b3:101d] Subsystem: Mellanox Technologies Device [15b3:0016] Flags: bus master, fast devsel, latency 0, IRQ 173, NUMA node 0, IOMMU group 65 Memory at ae000000 (64-bit, prefetchable) [size=32M] Expansion ROM at ab000000 [disabled] [size=1M] Capabilities: <access denied> Kernel driver in use: mlx5_core Kernel modules: mlx5_core I considered perhaps it was a firmware compatibility issue, so I tried to upgrade to 24.35.1012 (MT_0000000540) based on information found at [Network Adapter Fast Datapath Feature Support Matrix - Red Hat Customer Portal|https://access.redhat.com/articles/3538141#tested-firmware-2], but could not find that version in the Nvidia firmware download page. In the documentation for adding the annotation to the pod it gives instructions under the heading: [Adding the network attachment definition to your pods|https://docs.openshift.com/container-platform/4.12/networking/hardware_networks/configuring-hardware-offloading.html#adding-network-attachment-definition-to-pods_configuring-hardware-offloading] The part that I'm hung up on is that in the text after the bullet point it says to "add the .metadata.annotations.k8s.v1.cni.cncf.io/networks field", but in the example code the annotation uses "v1.multus-cni.io/default-network". Is that difference an error or is there something that I'm missing? When I have tried using the v1.multus-cni.io/default-network annotation, the pod creation is successful, but then the net1 interface that I'm expecting is not created when the pod is created. If I use the .metadata.annotations.k8s.v1.cni.cncf.io/networks annotation, the network interface net1 is created in the pod, but I am unable to ping from one pod to another and net1 shows a NO_CARRIER state. I feel like there is something I'm missing, but I don't know what it is. Also when I use the network type "ovn-k8s-cni-overlay" as given in the network attachment definition, the pod creation fails with the below error: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_greig-w1-ens1f0-mlx5-u2004-client_spk-test-automation_6d32f0ab-4e2e-4a4c-b6a9-d6933faeb969_0(022e338885656755f36bdfeb880239e67c9eef53bb7b6d7a6a00ac0d8c946b1f): error adding pod spk-test-automation_greig-w1-ens1f0-mlx5-u2004-client to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [spk-test-automation/greig-w1-ens1f0-mlx5-u2004-client/6d32f0ab-4e2e-4a4c-b6a9-d6933faeb969:w1-ens1f0-mlx5-netdev-180]: error adding container to network "w1-ens1f0-mlx5-netdev-180": SRIOV-CNI failed to configure VF "failed to set vf 15 vlan configuration: operation not supported" It only seems to work if I use "sriov" as the network type, but then I am running into the issues pinging the pods that I previously mentioned.
Version-Release number of selected component (if applicable):
4.12.30
How reproducible:
Always
Steps to Reproduce:
1. https://docs.openshift.com/container-platform/4.12/networking/hardware_networks/configuring-hardware-offloading.html
w1-ens1f0-mlx5-netdev-policy2.yaml:
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: w1-ens1f0-mlx5-netdev-policy2 namespace: openshift-sriov-network-operator spec: resourceName: W1Ens1f0Mlx5NetdevPolicy2 nodeSelector: kubernetes.io/hostname: "worker1.ndal-ocp3.ve.pd.f5net.com" numVfs: 30 mtu: 9000 nicSelector: vendor: "15b3" name: "ens1f0" deviceID: "101d" rootDevices: ["0000:3b:00.0"] pfNames: ["ens1f0#16-29"] deviceType: "netdevice" isRdma: true eSwitchMode: "switchdev"
w1-ens1f0-mlx5-netdev-policy1.yaml
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: w1-ens1f0-mlx5-netdev-policy1 namespace: openshift-sriov-network-operator spec: resourceName: W1Ens1f0Mlx5NetdevPolicy1 nodeSelector: kubernetes.io/hostname: "worker1.ndal-ocp3.ve.pd.f5net.com" numVfs: 30 mtu: 9000 nicSelector: vendor: "15b3" name: "ens1f0" deviceID: "101d" rootDevices: ["0000:3b:00.0"] pfNames: ["ens1f0#2-15"] deviceType: "netdevice" isRdma: true eSwitchMode: "switchdev"
w1-ens1f0-mlx5-2-net-attach-def.yaml:
apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: w1-ens1f0-policy2-net-attach-def namespace: spk-test-automation annotations: k8s.v1.cni.cncf.io/resourceName: openshift.io/W1Ens1f0Mlx5NetdevPolicy2 spec: config: '{"cniVersion":"0.3.1", "name":"w1-ens1f0-policy2-net-attach-def", "type":"sriov", "ipam":{"type": "host-local", "subnet": "10.1.12.0/24"}, "dns":{} }'
w1-ens1f0-mlx5-1-net-attach-def.yaml:{code:none} apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: w1-ens1f0-policy1-net-attach-def namespace: spk-test-automation annotations: k8s.v1.cni.cncf.io/resourceName: openshift.io/W1Ens1f0Mlx5NetdevPolicy1 spec: config: '{"cniVersion":"0.3.1", "name":"w1-ens1f0-policy1-net-attach-def", "type":"sriov", "ipam":{"type": "host-local", "subnet": "10.1.11.0/24"}, "dns":{} }'