Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.12
Component/s: Networking / SR-IOV
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
CNF Network Sprint 271, CNF Network Sprint 280
sprint_count:
2

RH Private Keywords:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

When using VFs from Mellanox ConnectX-6 Lx cards, there is a high percentage of packet loss.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Always

Steps to Reproduce:

    1. Deploy a baremetal cluster with LCAP bonding.
    2. Create SriovNetworkNodePolicy with MLX ConnectX-6 Lx cards
    3. Deploy pods using VFs from the previous policy in two different nodes
    4. Perform an ICMP tests between the pods using the VF interface, and you'll observer a high rate of packet loss > 50%

Actual results:

Packet loss > 50% when performing ICMP tests

Expected results:

No packet loss when performing ICMP tests

Additional info:

This cluster has a configuration similar to Verizon VCP100 with AMD EPYC 9654P 96-Core Processor and the following bonding configuration:
- bond0 (eno12399, eno12409) - ConnectX-6 Lx (25Gbps) [lacp]  - machine-network (br-ex)
- bond1 (ens3f0, ens3f1) - ConnectX-6 Dx (100Gbps) [lacp]
- bond2 (ens6f0, ens6f1) - ConnectX-6 Dx (100Gbps) [active-passive]
- SRIOV VFs are created from eno12399 and eno12409 (ConnectX-6 Lx - 25Gbps)

NOTE: The same tests work in 4.14 and above (ICMP with no packet loss).

This is how SRIOV resources were prepared:

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.74   True        False         7d11h   Cluster version is 4.12.74

$ oc get nodes
NAME                                 STATUS   ROLES                  AGE     VERSION
master-0.vcp100.partnerci.bos2.lab   Ready    control-plane,master   7d11h   v1.25.16+1eb8682
master-1.vcp100.partnerci.bos2.lab   Ready    control-plane,master   7d11h   v1.25.16+1eb8682
master-2.vcp100.partnerci.bos2.lab   Ready    control-plane,master   7d11h   v1.25.16+1eb8682
worker-0.vcp100.partnerci.bos2.lab   Ready    worker                 7d11h   v1.25.16+1eb8682
worker-1.vcp100.partnerci.bos2.lab   Ready    worker                 7d11h   v1.25.16+1eb8682
worker-2.vcp100.partnerci.bos2.lab   Ready    worker                 7d11h   v1.25.16+1eb8682
worker-3.vcp100.partnerci.bos2.lab   Ready    worker                 7d11h   v1.25.16+1eb8682

$ cat sriov/sriov-policy-mlx-cx6-lx-eno12399.yml                                                                  
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: mlx-cx6-lx-eno12399-policy1
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  isRdma: true
  mtu: 9000
  nicSelector:
    deviceID: "101f"
    pfNames:
    - eno12399#0-7
    vendor: 15b3
  nodeSelector:
    node-role.kubernetes.io/worker: ""
  numVfs: 8
  priority: 99
  resourceName: mlx_cx6_lx_eno12399_resource1

$ oc apply -f sriov/sriov-policy-mlx-cx6-lx-eno12399.yml
sriovnetworknodepolicy.sriovnetwork.openshift.io/mlx-cx6-lx-eno12399-policy1 created

$ cat sriov/sriov-network-mlx-cx6-lx-eno12399.yml 
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: mlx-cx6-lx-eno12399-net1
  namespace: openshift-sriov-network-operator
spec:
  logLevel: info
  networkNamespace: default
  resourceName: mlx_cx6_lx_eno12399_resource1
  spoofChk: "off"
  trust: "on"
  vlan: 3821
  capabilities: '{ "ips": true, "mac": true }'
  ipam: '{"type": "static"}'

$ oc apply -f sriov/sriov-network-mlx-cx6-lx-eno12399.yml                                                         
sriovnetwork.sriovnetwork.openshift.io/mlx-cx6-lx-eno12399-net1 created

$ oc get net-attach-def
NAME                       AGE
mlx-cx6-lx-eno12399-net1   2m14s

This is how the pods were prepared

$ cat sriov/sriov-net-mlx-cx6-lx-eno12399-pod1.yml                                                                
---
apiVersion: v1
kind: Pod
metadata:
  name: sriov-net-mlx-cx6-lx-eno12399-pod1
  annotations:
    k8s.v1.cni.cncf.io/networks: >
      [
         {
           "name": "mlx-cx6-lx-eno12399-net1",
           "mac": "00:11:22:33:44:01",
           "ips": ["172.16.151.10/26"],
           "namespace": "default"
         }
      ]
    cpu-load-balancing.crio.io: "disable"
    cpu-quota.crio.io: "disable"
    irq-load-balancing.crio.io: "disable"
spec:
  nodeName: worker-0.vcp100.partnerci.bos2.lab
  runtimeClassName: performance-blueprint-profile
  containers:
  - args:
    - while true; do sleep 99999999; done;
    command:
    - /bin/sh
    - -c
    - --
    image: mirror.gcr.io/wbitt/network-multitool:openshift
    imagePullPolicy: Always
    name: main
    resources:
      limits:
        cpu: "2"
        memory: 2Gi
        hugepages-1Gi: 2Gi
      requests:
        cpu: "2"
        memory: 2Gi
        hugepages-1Gi: 2Gi
    securityContext:
      capabilities:
        add:
        - IPC_LOCK
        - NET_ADMIN
        - AUDIT_WRITE
    volumeMounts:
    - mountPath: /dev/hugepages
      name: hugepage
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages

$ oc apply -f sriov/sriov-net-mlx-cx6-lx-eno12399-pod1.yml 
pod/sriov-net-mlx-cx6-lx-eno12399-pod1 created

$ cat sriov/sriov-net-mlx-cx6-lx-eno12399-pod2.yml                                                                
---
apiVersion: v1
kind: Pod
metadata:
  name: sriov-net-mlx-cx6-lx-eno12399-pod2
  annotations:
    k8s.v1.cni.cncf.io/networks: >
      [
         {
           "name": "mlx-cx6-lx-eno12399-net1",
           "mac": "00:11:22:33:44:02",
           "ips": ["172.16.151.11/26"],
           "namespace": "default"
         }
      ]
    cpu-load-balancing.crio.io: "disable"
    cpu-quota.crio.io: "disable"
    irq-load-balancing.crio.io: "disable"
spec:
  nodeName: worker-1.vcp100.partnerci.bos2.lab
  runtimeClassName: performance-blueprint-profile
  containers:
  - args:
    - while true; do sleep 99999999; done;
    command:
    - /bin/sh
    - -c
    - --
    image: mirror.gcr.io/wbitt/network-multitool:openshift
    imagePullPolicy: Always
    name: main
    resources:
      limits:
        cpu: "2"
        memory: 2Gi
        hugepages-1Gi: 2Gi
      requests:
        cpu: "2"
        memory: 2Gi
        hugepages-1Gi: 2Gi
    securityContext:
      capabilities:
        add:
        - IPC_LOCK
        - NET_ADMIN
        - AUDIT_WRITE
    volumeMounts:
    - mountPath: /dev/hugepages
      name: hugepage
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages

$ oc apply -f sriov/sriov-net-mlx-cx6-lx-eno12399-pod2.yml                                               
pod/sriov-net-mlx-cx6-lx-eno12399-pod2 created

$ oc get pods -o wide
NAME                                 READY   STATUS    RESTARTS   AGE   IP            NODE                                 NOMINATED NODE   READINESS GATES
sriov-net-mlx-cx6-lx-eno12399-pod1   1/1     Running   0          53s   10.131.0.31   worker-0.vcp100.partnerci.bos2.lab   <none>           <none>
sriov-net-mlx-cx6-lx-eno12399-pod2   1/1     Running   0          42s   10.130.2.53   worker-1.vcp100.partnerci.bos2.lab   <none>

ICMP tests between pods and to the GW (172.16.151.1) show a high percentage of packet loss:

$ oc exec -it sriov-net-mlx-cx6-lx-eno12399-pod1 -- /bin/bash
sriov-net-mlx-cx6-lx-eno12399-pod1:/$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0@if49: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue state UP group default 
    link/ether 0a:58:0a:83:00:1f brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.131.0.31/23 brd 10.131.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::858:aff:fe83:1f/64 scope link 
       valid_lft forever preferred_lft forever
41: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
    link/ether 00:11:22:33:44:01 brd ff:ff:ff:ff:ff:ff permaddr 62:04:57:80:72:9c
    inet 172.16.151.10/26 brd 172.16.151.63 scope global net1
       valid_lft forever preferred_lft forever
    inet6 fe80::211:22ff:fe33:4401/64 scope link 
       valid_lft forever preferred_lft forever

sriov-net-mlx-cx6-lx-eno12399-pod1:/$ ping -c4 172.16.151.11
PING 172.16.151.11 (172.16.151.11) 56(84) bytes of data.
From 172.16.151.10 icmp_seq=1 Destination Host Unreachable
--- 172.16.151.11 ping statistics ---
4 packets transmitted, 0 received, +1 errors, 100% packet loss, time 3072ms
pipe 4

sriov-net-mlx-cx6-lx-eno12399-pod1:/$ ping -c4 172.16.151.1
PING 172.16.151.1 (172.16.151.1) 56(84) bytes of data.
64 bytes from 172.16.151.1: icmp_seq=1 ttl=64 time=0.317 ms
64 bytes from 172.16.151.1: icmp_seq=4 ttl=64 time=0.242 ms
--- 172.16.151.1 ping statistics ---
4 packets transmitted, 2 received, 50% packet loss, time 3111ms
rtt min/avg/max/mdev = 0.242/0.279/0.317/0.037 ms

$ oc exec -it sriov-net-mlx-cx6-lx-eno12399-pod2 -- /bin/bash
sriov-net-mlx-cx6-lx-eno12399-pod2:/$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0@if47: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue state UP group default 
    link/ether 0a:58:0a:82:02:35 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.130.2.53/23 brd 10.130.3.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::858:aff:fe82:235/64 scope link 
       valid_lft forever preferred_lft forever
40: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
    link/ether 00:11:22:33:44:02 brd ff:ff:ff:ff:ff:ff permaddr 46:31:60:84:c6:c4
    inet 172.16.151.11/26 brd 172.16.151.63 scope global net1
       valid_lft forever preferred_lft forever
    inet6 fe80::211:22ff:fe33:4402/64 scope link 
       valid_lft forever preferred_lft forever

sriov-net-mlx-cx6-lx-eno12399-pod2:/$ ping -c4 172.16.151.10
PING 172.16.151.10 (172.16.151.10) 56(84) bytes of data.
From 172.16.151.11 icmp_seq=1 Destination Host Unreachable
--- 172.16.151.10 ping statistics ---
4 packets transmitted, 0 received, +1 errors, 100% packet loss, time 3110ms
pipe 4

sriov-net-mlx-cx6-lx-eno12399-pod2:/$ ping -c4 172.16.151.1
PING 172.16.151.1 (172.16.151.1) 56(84) bytes of data.
64 bytes from 172.16.151.1: icmp_seq=3 ttl=64 time=0.248 ms
--- 172.16.151.1 ping statistics ---
4 packets transmitted, 1 received, 75% packet loss, time 3099ms
rtt min/avg/max/mdev = 0.248/0.248/0.248/0.000 ms

Assignee:: Sebastian Scheinkman

Reporter:: Manuel Rodriguez (Inactive)

Need Info From:: None

Contributors:: None

QA Contact:: Zhiqiang Fang

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2025/04/24 9:35 AM

Updated:: 2025/12/02 1:45 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates