Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-12454

BZ#2323844 [bug][RHOS17.1] Infra vlans not working when deploying a compute with it's bond on a nic-partitioned vf

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Normal Normal
    • rhos-17.1.z
    • None
    • internal
    • False
    • Hide

      None

      Show
      None
    • False
    • rhos-connectivity-nfv
    • None
    • Important

      This BZ is cloned for tracking the fix from openvswitch, while the original BZ#2310427 will be used for the proposed workaround in tripleo.

      +++ This bug was initially created as a clone of Bug #2310427 +++

      Description of problem:

      • If we setup two nic-partitioning to run bond0 with all infra vlans on top of 2 vf's.
        Network configuration is as follows:
        ┌───────┐
        │ ┼─────┐
        │ PF │ │
        └───────┘ ┌────┐┌───┐
        Connectx-6 LX NIC#1 │ VF ││ │
        └────┘│ │ ┌────────┐
        │ bond0 (mode=1)──│vlan39XX├─ 192.168.2.X
        Connectx-6 LX NIC#2 ┌────┐│ │ └────────┘
        ┌───────┐ │ VF ││ │
        │ PF │ ├────┘└───┘
        │ ┼─────┘
        └───────┘
      • This setup works fine on osp16.2 / RHEL8.4.
      • On osp17.1/RHEL9.2 this works only if the VF is in promisc mode:
        10: p1p1_0: <BROADCAST,MULTICAST,PROMISC,SLAVE,UP,LOWER_UP> mtu 9050 qdisc mq master bond0 state UP group default qlen 1000
      • In the NFV docs it's mentioned that you can indeed put the vf in promiscuous mode.
        But it doesn't specify if it's a must do for being able to run your infra vlans on top of it.

      Version-Release number of selected component (if applicable):
      [redhat-release] Red Hat Enterprise Linux release 9.2 (Plow)
      [rhosp-release] Red Hat OpenStack Platform release 17.1.3 (Wallaby)
      openvswitch3.1-3.1.0-104.el9fdp.x86_64

      How reproducible:
      Every time.

      Steps to Reproduce:

      • OSP17.1 environments that have been upgraded from OSP16.2

      Actual results:
      Works only

      • if set the vf in promisc mode
      • disable openvswitch
        . systemctl ENABLE tripleo*
        . systemctl disable openvswitch.service
        . moved away /usr/lib/systemd/system/ovsdb-server.service
        . moved away /usr/lib/systemd/system/ovs-delete-transient-ports.service
        . moved away /usr/lib/systemd/system/ovs-vswitchd.service

      Expected results:
      Should work as on 16.2 with exact same configuration

      Additional info:

      • The problem reproduces as well if the VLAN is configured on top of the VF (without bonding)
        Connect-6 LX NIC#1
        ┌───────┐ ┌────┐ ┌────────┐
        │ PF ┼──────┼ VF ├────│vlan39xx├── 192.168.2.x
        └───────┘ └────┘ └────────┘
      • If the VLAN is configured on top of the PF interface, everything works and no promisc mode is needed.
        Connect-6 LX NIC#1
        ┌───────┐ ┌────────┐
        │ PF ┼─────│vlan39xx├── 192.168.2.x
        └───────┘ └────────┘
      • tried firmware for osp17:
      • 26.41.1002
      • 26.39.1002
      • 26.38.1002
      • 26.36.1010

      — Additional comment from Luigi Tamagnone on 2024-09-06 15:00:57 UTC —

      • sos report of oscar23com089 which was deployed under osp17(not working without promisc)
      • sos report of oscar22com240 running on osp16.2 (working perfectly fine)

      Detailed description from the customer in c#66:
      """
      So openstack the openstack installation exists in 2 parts, the OS
      installation + network deployment and the OSP installation.

      1. OS + network which gets done with the command: openstack overcloud node provision

      • After this phase the network is working fine in NON promisious mode
        (sosreport-oscar23com088-2024-08-20-dlpydhr.tar.xz)
      • After a reboot the network is still working fine in NON promisious mode
        (sosreport-oscar23com088-2024-08-20-wicnavi.tar.xz)

      This proofs to me that there is no issue with the image or actual way of
      how we do the bonding or any firmware thing, as in this phase you can
      still reboot as much as you like and it stays working.

      2. OSP deployment which gets done with the command: openstack overcloud deploy

      • After this phase the network is working fine in NON promisious mode
        (sosreport-oscar23com088-2024-08-22-pzzzzrp.tar.xz)
      • After a reboot the network is broken in NON promisious mode
        (sosreport-oscar23com088-2024-08-22-jrmuqfi.tar.xz)

      This show that some of the settings that the OSP deploy put in place and
      take effect only after the reboot break the networking in NON promisious
      mode.
      """

      — Additional comment from Ella Shulman on 2024-09-08 11:43:39 UTC —

      Hi, can you please specify what is refereed in this ticket as the infra network? also sharing the templates would help a lot in understanding and reproducing the issue

      — Additional comment from Luigi Tamagnone on 2024-09-09 08:35:59 UTC —

      > can you please specify what is refereed in this ticket as the infra network?
      To the bond0 are attached management, storage and tenant vlan network.

      The templates are on case 03890610 there are the files and the bash file used for the deploy in templates-osp17.tar.gz

      — Additional comment from Benjamin Poirier on 2024-09-10 14:45:12 UTC —

      I passed on the information from this ticket to Maor Dickman from Nvidia. He thinks this issue is not related to OVS and he asked:
      > Did you tried to reproduce with simple OVS configuration? Or Legacy SRIOV?

      — Additional comment from Ella Shulman on 2024-09-10 15:21:07 UTC —

      Hi Luigi

      I took a deeper look into the case and the reason it is failing is that you can not co-allocate the tenant network like this when using VFs please use a separate NIC for the tenant network. I'll add a request for additional doc text on this.

      BR

      — Additional comment from Luigi Tamagnone on 2024-09-11 12:39:14 UTC —

      I think there was a misunderstanding between dpdkbond0[1][2] and bond0[3][4].
      the tenant network is on dpdkbond0 and some vlan are on bond0.

      The issue is on bond0

      [1] osp16
      {
      "addresses": [

      { "ip_netmask": "192.168.32.242/22" }

      ],
      "members": [
      {
      "members": [
      {
      "driver": "mlx5_core",
      "members": [

      { "name": "p2p1", "type": "interface" }

      ],
      "mtu": 9050,
      "name": "dpdk0",
      "type": "ovs_dpdk_port"
      },
      {
      "driver": "mlx5_core",
      "members": [

      { "name": "p2p2", "type": "interface" }

      ],
      "mtu": 9050,
      "name": "dpdk1",
      "type": "ovs_dpdk_port"
      }
      ],
      "mtu": 9050,
      "name": "dpdkbond0",
      "ovs_options": "bond_mode=balance-slb lacp=active other_config:lacp-time=fast",
      "rx_queue": 4,
      "type": "ovs_dpdk_bond"
      }
      ],
      "name": "br-ex",
      "ovs_extra": [
      "set port br-ex tag=3955"
      ],
      "type": "ovs_user_bridge",
      "use_dhcp": false
      },
      [2] osp17

      • addresses:
      • ip_netmask: 192.168.1.208/24
        members:
      • members:
      • driver: mlx5_core
        members:
      • name: p2p1
        type: interface
        mtu: 9050
        name: dpdk0
        type: ovs_dpdk_port
      • driver: mlx5_core
        members:
      • name: p2p2
        type: interface
        mtu: 9050
        name: dpdk1
        type: ovs_dpdk_port
        mtu: 9050
        name: dpdkbond0
        ovs_options: bond_mode=balance-slb lacp=active other_config:lacp-time=fast
        rx_queue: 1
        type: ovs_dpdk_bond
        name: br-ex
        ovs_extra: set port br-ex tag=3955
        type: ovs_user_bridge
        use_dhcp: false
        [3] osp16
        Unknown macro: { "bonding_options"}

        ,

        Unknown macro: { "addresses"}

        ,

      [4] osp17

      • bonding_options: miimon=100 mode=1
        dns_servers: ['10.34.255.252', '10.34.255.253']
        domain: []
        members:
      • device: p1p1
        type: sriov_vf
        vfid: 0
        promisc: false
      • device: p1p2
        type: sriov_vf
        vfid: 0
        promisc: false
        mtu: 9050
        name: bond0
        type: linux_bond
        use_dhcp: false
      • addresses:
      • ip_netmask: 192.168.2.67/24
        device: bond0
        mtu: 9000
        type: vlan
        vlan_id: 3951

      — Additional comment from Karthik Sundaravel on 2024-09-18 16:02:12 UTC —

      Hi Luigi

      Can we add `primary` field in the bond and check the behaviour

      • bonding_options: miimon=100 mode=1
        dns_servers: ['10.34.255.252', '10.34.255.253']
        domain: []
        members:
      • device: p1p1
        type: sriov_vf
        vfid: 0
        promisc: false
        primary: true
      • device: p1p2
        type: sriov_vf
        vfid: 0
        promisc: false
        mtu: 9050
        name: bond0
        type: linux_bond
        use_dhcp: false
      • addresses:
      • ip_netmask: 192.168.2.67/24
        device: bond0
        mtu: 9000
        type: vlan
        vlan_id: 3951

      — Additional comment from Luigi Tamagnone on 2024-09-19 10:06:37 UTC —

      @Karthik Unfortunately, there is no change in behaviour, Cu still doesn't have network connectivity.

      — Additional comment from Benjamin Poirier on 2024-09-20 14:40:51 UTC —

      I tried a few different ways based on the ascii art diagrams and the problem
      did not reproduce. For instance, I tried the following:

      devlink dev eswitch set pci/0000:08:00.0 mode switchdev
      echo 1 > /sys/bus/pci/devices/0000:08:00.0/sriov_numvfs
      udevadm settle
      ip link add br0 up type bridge
      ip link set dev eth2 up master br0 # PF
      ip link set dev eth4 up master br0 # VF PR
      ip link set dev eth5 up # actual VF
      ip addr add 192.168.1.1/24 dev eth5
      ping -c4 192.168.1.2 # ok
      ip link add eth5.39 link eth5 up type vlan id 39
      ip addr add 192.168.2.1/24 dev eth5.39
      ping -c4 192.168.2.2 # ok
      systemctl start openvswitch.service
      ip link show dev eth5 # no "PROMISC" flag
      ping -c4 192.168.2.2 # ok

      In the above, I used kernel 5.14.0-284.30.1.el9_2.x86_64, adapter CX-6 Lx with
      firmware 26.41.1000.

      Presumably, more specific openvswitch configuration is needed to reproduce the
      problem but I can't guess what it is, especially given that I have next to no
      experience with OVS.

      Can you to try to simplify the reproduction environment (ie. without OSP)
      and provide detailed reproduction instructions?

      — Additional comment from Karthik Sundaravel on 2024-09-23 09:27:37 UTC —

      Hi Benjamin,

      I'll try to make a simplified reproducer without OSP.
      The issue is seen with legacy SR-IOV and not switchdev.

      — Additional comment from Karthik Sundaravel on 2024-09-25 12:22:24 UTC —

      Hi Luigi,

      can you please share the model of NIC - CX6 or CX5 ?

      Karthik

      — Additional comment from Luigi Tamagnone on 2024-09-25 13:07:22 UTC —

      It should be
      3 dual-port (6) Mellanox Technologies MT2894 Family [ConnectX-6 Lx] [15b3:101f]

      — Additional comment from on 2024-09-25 15:13:19 UTC —

      Team,

      I am enabling escalation flag on this bug as case assigned to it was escalated by TAM of Belgacom customer.

      This issue will be a big problem when they will upgrade their telco cloud cluster 2 which is planned for 1November - this still allows us some time BUT we have been going on about 2 months with this case and have not yet come to a conclusion hence they would like to re-engage our attention on this issue now that the urgent problems have been resolved post upgrade. We'd really need to know what's going on and how we should proceed as we estimate that almost 40 servers of the telco cluster 2 will be impacted

      If bug could be prioritized that will be appreciated,

      Regards,

      Joanna

      Senior Escalation Manager

      — Additional comment from Nate Johnston on 2024-09-26 12:49:36 UTC —

      @jfindysz@redhat.com If this is a priority engineering escalation please follow the RHOS Prio escalation steps at https://spaces.redhat.com/display/RHOSPRIO/RHOSP+Priority+List+%28v2.0%29+Workflow for proper engineering engagement at an escalated level.

      — Additional comment from Karthik Sundaravel on 2024-09-26 17:55:43 UTC —

      — Additional comment from Karthik Sundaravel on 2024-09-26 18:02:19 UTC —

      Steps to reproduce
      ------------------
      STEP 1) download the config file. Please modify the entries tagged with " => CHANGE ME"
      STEP 2)
      download os-net-config from the git repo https://github.com/os-net-config/os-net-config.git
      cd os-net-config; git fetch -v --all; git switch -c stable/wallaby origin/stable/wallaby
      python setup.py install --prefix=/usr
      os-net-config -d -c <path to the config file>
      Dependencies:
      Python 3.7.0 or higher is required. Other modules could be installed via pip

      STEP 3) repeat the above steps on second machine with different ip address

      STEP 4) Ping from one machine to another. It works now.

      STEP 5) Reboot one machine. Ping doesn't work.

      STEP 6) On the rebooted machine, do
      ip link set dev <device name for VF-id> promisc on => repeat this for second interface as well.
      Ping works now.

      or

      ip link set dev <device name for VF-id> down => Ping works in my setup

      Kernel version
      [tripleo-admin@compute-0 ~]$ uname -r
      5.14.0-284.82.1.el9_2.x86_64

      Driver/FW version:
      [tripleo-admin@compute-0 ~]$ ethtool -i ens2f0np0
      driver: mlx5_core
      version: 5.14.0-284.82.1.el9_2.x86_64
      firmware-version: 26.36.1010 (MT_0000000532)
      expansion-rom-version:
      bus-info: 0000:17:00.0
      supports-statistics: yes
      supports-test: yes
      supports-eeprom-access: no
      supports-register-dump: no
      supports-priv-flags: yes

      Device: Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx]

      I did the above steps to reproduce the issue.

      — Additional comment from Benjamin Poirier on 2024-09-27 21:37:07 UTC —

      I followed the instructions in comment 16 but faced a few errors and
      ultimately there was no "mellanox_bond" interface.

      I used an up to date RHEL-9.2 install. Here are the commands that I ran:

      1. git clone https://github.com/os-net-config/os-net-config.git
      2. cd os-net-config/
      3. git fetch -v --all
      4. git switch -c stable/wallaby origin/stable/wallaby
      5. python setup.py install --prefix=/usr
      6. os-net-config -d -c ~/config_mellanox_no_promisc.yaml
      7. pip install oslo_concurrency
      8. os-net-config -d -c ~/config_mellanox_no_promisc.yaml
      9. pip install pyudev
      10. os-net-config -d -c ~/config_mellanox_no_promisc.yaml
      11. pip install jsonschema
      12. os-net-config -d -c ~/config_mellanox_no_promisc.yaml
        [...]
        NoneType: None
        Traceback (most recent call last):
        File "/usr/bin/os-net-config", line 10, in <module>
        sys.exit(main())
        File "/usr/lib/python3.9/site-packages/os_net_config/cli.py", line 360, in main
        pf_files_changed = provider.apply(cleanup=opts.cleanup,
        File "/usr/lib/python3.9/site-packages/os_net_config/impl_ifcfg.py", line 2020, in apply
        self.ifdown(interface)
        File "/usr/lib/python3.9/site-packages/os_net_config/_init_.py", line 500, in ifdown
        self.execute(msg, '/sbin/ifdown', interface, check_exit_code=False)
        File "/usr/lib/python3.9/site-packages/os_net_config/_init_.py", line 480, in execute
        out, err = processutils.execute(cmd, *args, **kwargs)
        File "/usr/local/lib/python3.9/site-packages/oslo_concurrency/processutils.py", line 401, in execute
        obj = subprocess.Popen(cmd,
        File "/usr/lib64/python3.9/subprocess.py", line 951, in _init_
        self._execute_child(args, executable, preexec_fn, close_fds,
        File "/usr/lib64/python3.9/subprocess.py", line 1821, in _execute_child
        raise child_exception_type(errno_num, err_msg, err_filename)
        FileNotFoundError: [Errno 2] No such file or directory: '/sbin/ifdown'
      13. dnf install -y NetworkManager-initscripts-updown
      14. os-net-config -d -c ~/config_mellanox_no_promisc.yaml
        [...]
        2024-09-28 00:14:00.203 INFO os_net_config.execute running ifup on interface: enp8s0f0v1
        2024-09-28 00:14:00.394 INFO os_net_config.execute running ifup on interface: enp8s0f1v1
        2024-09-28 00:14:00.582 INFO os_net_config.execute running ifup on interface: mellanox_bond
        2024-09-28 00:14:00.612 ERROR os_net_config.impl_ifcfg.apply Failure(s) occurred when applying configuration
        2024-09-28 00:14:00.612 ERROR os_net_config.impl_ifcfg.apply stdout: , stderr: Error: unknown connection '/etc/sysconfig/network-scripts/ifcfg-enp8s0f0v1'.
        Failure to activate file "enp8s0f0v1"!

      See all profiles with `nmcli connection`.
      Reload files from disk with `nmcli connection reload`
      Activate the desired profile with `nmcli connection up \"$NAME\"`

      2024-09-28 00:14:00.612 ERROR os_net_config.impl_ifcfg.apply stdout: , stderr: Error: unknown connection '/etc/sysconfig/network-scripts/ifcfg-enp8s0f1v1'.
      Failure to activate file "enp8s0f1v1"!

      See all profiles with `nmcli connection`.
      Reload files from disk with `nmcli connection reload`
      Activate the desired profile with `nmcli connection up \"$NAME\"`

      2024-09-28 00:14:00.612 ERROR os_net_config.impl_ifcfg.apply stdout: , stderr: Error: unknown connection '/etc/sysconfig/network-scripts/ifcfg-mellanox_bond'.
      Failure to activate file "mellanox_bond"!

      See all profiles with `nmcli connection`.
      Reload files from disk with `nmcli connection reload`
      Activate the desired profile with `nmcli connection up \"$NAME\"`

      2024-09-28 00:14:00.612 ERROR os_net_config.main **Failed to configure with ifcfg provider**
      ConfigurationError('Failure(s) occurred when applying configuration')
      2024-09-28 00:14:00.612 ERROR os_net_config.common.log_exceptions Traceback (most recent call last):
      File "/usr/bin/os-net-config", line 10, in <module>
      sys.exit(main())
      File "/usr/lib/python3.9/site-packages/os_net_config/cli.py", line 392, in main
      files_changed = provider.apply(cleanup=opts.cleanup,
      File "/usr/lib/python3.9/site-packages/os_net_config/impl_ifcfg.py", line 2147, in apply
      raise os_net_config.ConfigurationError(message)
      os_net_config.ConfigurationError: Failure(s) occurred when applying configuration
      NoneType: None
      Traceback (most recent call last):
      File "/usr/bin/os-net-config", line 10, in <module>
      sys.exit(main())
      File "/usr/lib/python3.9/site-packages/os_net_config/cli.py", line 392, in main
      files_changed = provider.apply(cleanup=opts.cleanup,
      File "/usr/lib/python3.9/site-packages/os_net_config/impl_ifcfg.py", line 2147, in apply
      raise os_net_config.ConfigurationError(message)
      os_net_config.ConfigurationError: Failure(s) occurred when applying configuration

      1. ls /etc/sysconfig/network-scripts/
        ifcfg-enp8s0f0np0 ifcfg-enp8s0f0v1 ifcfg-enp8s0f1np1 ifcfg-enp8s0f1v1 ifcfg-mellanox_bond readme-ifcfg-rh.txt
      2. nmcli con
        NAME UUID TYPE DEVICE
        enp5s0 bbb03040-9469-4436-9537-4e6ecafadeff ethernet enp5s0
        enp4s0 d05673ca-6f4f-44be-ae6b-353b18a83f1d ethernet enp4s0
        lo 205e4428-2079-4e7a-89da-4cb811c0ce8d loopback lo
        System enp8s0f0np0 8cfe20f3-2c47-a269-16cf-ed6e17919c74 ethernet enp8s0f0np0
        System enp8s0f1np1 a3c65d4a-a91d-7bd5-63bd-f6f55fd22cc8 ethernet enp8s0f1np1

      All of the ifcfg-* files under /etc/sysconfig/network-scripts/ were created by
      os-net-config but NetworkManager only loads ifcfg-enp8s0f0np0 and
      ifcfg-enp8s0f1np1. I noticed this difference:

      1. grep NM_CONTROLLED ifcfg-*
        ifcfg-enp8s0f0np0:NM_CONTROLLED=yes
        ifcfg-enp8s0f0v1:NM_CONTROLLED=no
        ifcfg-enp8s0f1np1:NM_CONTROLLED=yes
        ifcfg-enp8s0f1v1:NM_CONTROLLED=no
        ifcfg-mellanox_bond:NM_CONTROLLED=no

      So it seems expected that NetworkManager will not load some of those files.

      Do the files have a similar content when you follow the instructions. Does
      NetworkManager load them?

      Since you did not mention installing NetworkManager-initscripts-updown, is it
      expected that I ran into the first quoted error (FileNotFoundError) before
      installing that package?

      Let me know if you have some additionnal suggestions.

      — Additional comment from Karthik Sundaravel on 2024-09-28 10:30:41 UTC —

      Hi Benjamin,

      In OSP, we use the package openstack-network-scripts (aka initscripts) for the ifup / ifdown commands.
      So please remove the package `NetworkManager-initscripts-updown` and install openstack-network-scripts.

      I fetched the version of openstack-network-scripts from another system, and it should be more or less same as the one I have used for reproducing the issue.

      Name : openstack-network-scripts
      Version : 10.11.1
      Release : 9.17_1.1.el9ost
      Architecture : x86_64
      Size : 161 k
      Source : openstack-network-scripts-10.11.1-9.17_1.1.el9ost.src.rpm
      Repository : @System
      From repo : rhos-17.1
      Summary : Legacy scripts for manipulating of network devices
      URL : https://github.com/fedora-sysv/initscripts
      License : GPLv2

      — Additional comment from Nate Johnston on 2024-10-01 20:56:55 UTC —

      Adding link to RHOS Prio ticket

      — Additional comment from Benjamin Poirier on 2024-10-02 20:59:10 UTC —

      I installed os-net-config from the rhoso-18.0-for-rhel-9-x86_64-rpms
      repository.

      > STEP 4) Ping from one machine to another. It works now.

      Indeed

      > STEP 5) Reboot one machine. Ping doesn't work.

      After reboot, the mellanox_bond interface does not exist.
      I started the "network" init script (part of os-net-config) manually but it
      reported some errors and failed:

      Oct 02 23:30:49 c-236-4-240-243 network[2110]: Bringing up interface mellanox_bond:
      Oct 02 23:30:49 c-236-4-240-243 network[2477]: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Device enp8s0f0
      v1 does not seem to be present, delaying initialization.
      Oct 02 23:30:49 c-236-4-240-243 /etc/sysconfig/network-scripts/ifup-eth[2500]: Device enp8s0f0v1 does not seem to be
      present, delaying initialization.
      Oct 02 23:30:49 c-236-4-240-243 network[2407]: WARN : [/etc/sysconfig/network-scripts/ifup-eth] Unable to start
      slave device ifcfg-enp8s0f0v1 for master mellanox_bond.
      Oct 02 23:30:49 c-236-4-240-243 /etc/sysconfig/network-scripts/ifup-eth[2501]: Unable to start slave device ifcfg-enp8s0f0v1 for master mellanox_bond.
      Oct 02 23:30:49 c-236-4-240-243 network[2502]: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Device enp8s0f1v1 does not seem to be present, delaying initialization.
      Oct 02 23:30:49 c-236-4-240-243 /etc/sysconfig/network-scripts/ifup-eth[2525]: Device enp8s0f1v1 does not seem to be present, delaying initialization.
      Oct 02 23:30:49 c-236-4-240-243 network[2407]: WARN : [/etc/sysconfig/network-scripts/ifup-eth] Unable to start slave device ifcfg-enp8s0f1v1 for master mellanox_bond.

      The VF interfaces are not present. While config_mellanox_no_promisc.yaml
      includes a directive to create 4 VFs:

      • type: sriov_pf
        name: nic11 => CHANGE ME
        mtu: 9000
        numvfs: 4

      ... this information does not seem to be reflected in the files that were
      created under /etc/sysconfig/network-scripts:

      root@c-236-4-240-243:/etc/sysconfig/network-scripts# cat ifcfg-enp8s0f0np0

      1. This file is autogenerated by os-net-config
        DEVICE=enp8s0f0np0
        ONBOOT=yes
        HOTPLUG=no
        NM_CONTROLLED=yes
        PEERDNS=no
        BOOTPROTO=none
        MTU=9000
        DEFROUTE=no
        root@c-236-4-240-243:/etc/sysconfig/network-scripts# cat ifcfg-enp8s0f0v1
      2. This file is autogenerated by os-net-config
        DEVICE=enp8s0f0v1
        ONBOOT=yes
        HOTPLUG=no
        NM_CONTROLLED=no
        PEERDNS=no
        MASTER=mellanox_bond
        SLAVE=yes
        BOOTPROTO=none

      So I'm not sure how this is supposed to work.

      Did you try the reproduction instructions on RHEL-9.2? How were the interfaces
      defined in the yaml file created after boot?

      — Additional comment from Benjamin Poirier on 2024-10-02 21:02:02 UTC —

      > I installed os-net-config from the rhoso-18.0-for-rhel-9-x86_64-rpms
      ^
      I meant "openstack-network-scripts", sorry.

      — Additional comment from Karthik Sundaravel on 2024-10-03 01:54:15 UTC —

      os-net-config creates /var/lib/os-net-config/sriov_config.yaml, where the numvfs and other VF configurations are present.
      Also os-net-config adds a service file sriov_config.
      During reboot, os-net-config sriov_config service will read the sriov_config.yaml and apply the settings.

      And then network service brings up the bonds configured in the ifcfg files.

      — Additional comment from Benjamin Poirier on 2024-10-04 13:10:22 UTC —

      > During reboot, os-net-config sriov_config service will read the sriov_config.yaml and apply the settings.

      At the time when I wrote comment 20, "sriov_config.service" was failing and I
      didn't notice. It was failing because I had installed os-net-config in a venv
      instead of system-wide and the service file doesn't handle that. I installed
      it under /usr like the original instructions said, I also enabled
      "network.service" and then the network config was applied at boot as expected.

      > STEP 5) Reboot one machine. Ping doesn't work.

      In my case, now that the network services are starting properly, the problem
      does not reproduce; ping works after reboot and the vf interfaces do NOT have
      the promisc flag. I had a call with Karthik yesterday and showed him that.

      I guess the problem depends on some more specific configuration to reproduce.
      Can you please try to narrow it down?

      — Additional comment from Madhur Gupta on 2024-10-11 11:15:11 UTC —

      Hello ksundara@redhat.com and bpoirier@redhat.com,

      Do you need anything from our side or any data from the customer to help expedite the resolution?

      As informed earlier, their upgrade is planned for 1st of November and they won't be able to push this date further.

      @ njohnston@redhat.com please let us know if we can help with anything? If you need to connect with customer's reproducer then it can be provided as well.

      Regards,
      Madhur Gupta
      TAM for Belgacom

      — Additional comment from Karthik Sundaravel on 2024-10-12 01:29:28 UTC —

      Benjamin (Partner engineer from Nvidia) is working on the issue. This needs investigation from Nvidia, since the PF/VF configurations applied by os-net-config in both working (OSP16.2) and non working (OSP17.1) are the same, but seeing different behaviour from the SR-IOV nic.

      @Madhur
      We have reproduced the issue in our development machines and given access to Benjamin to investigate. We have "ConnectX-5 Ex" in our lab, while the customer has seen this issue in "ConnectX-6 Lx". If we could get couple of machines from the customer (where the issue is seen) for Benjamin , it could be helpful as well.

      @Benjamin, we have a deadline of 1st November. Please note that we have a high priority and date pressure to have a fix by then.

      — Additional comment from Benjamin Poirier on 2024-10-15 16:12:11 UTC —

      Karthik provided access to a system at Red Hat where the problem occurs. I
      began to investigate the situation on that system. It did not use vlans, it
      was just a bond over two VFs. I observed the following:
      *)
      When the problem occurred, I deleted the bond and assigned the ip address
      directly on the VF that was the active bond member. The problem continued, so
      might not be related to bond or vlan. In the same way as reported in the
      description, after setting that VF to promisc mode, the problem was resolved
      (ping worked).
      *)
      When the problem occurs, `ip -s link` shows that the packet RX counter on the
      VF section of the PF netdev increases, but the packet RX counter on the VF
      netdev itself does not increase.
      `ethtool -S` on the VF shows that the rx_steer_missed_packets counter
      increases.

      I tried to dump the steering rules on the adapter using 'mlxdump fsdump' but
      it did not work. I opened a ticket for this at Nvidia (RM-4124320).
      *)
      If I do `systemctl disable openvswitch.service` and reboot, the problem does
      not occur. However, openvswitch still gets started at boot by network.service.
      So there might be different behavior depending on how/when OVS is started.
      Moreover, the OVS configuration does not actually include the ConnectX nic
      AFAIK. It includes two Intel nics.

      Can you try again to provide simple but complete reproduction instructions?

      — Additional comment from Karthik Sundaravel on 2024-10-16 14:23:16 UTC —

      Benjamin,

      I'll try to reproduce the issue on non openstack setup. I'll share the steps when I have one.

      Meanwhile, as we speak the ovs bridges were all cleaned up in those machines and we still see some interference between openvswitch and the Mellanox cards.
      Does this call for a look up from the openvswitch team ?

      — Additional comment from Greg Rakauskas on 2024-10-16 15:57:52 UTC —

      Hi Eran,

      Will this BZ be verified for 17.1.4?

      We need to know whether to add this BZ to the RHOSP 17.1.4 Release Notes.

      Thanks,
      --Greg R.

      — Additional comment from Karthik Sundaravel on 2024-10-16 16:42:17 UTC —

      Hi Madhur,

      We (Benjamin and myself) have found that disabling DPDK solves the connectivity issue. We would like to understand if in OSP16.2, does the customer use DPDK on any port (need not be mellanox nics) in the affected node ?

      — Additional comment from Madhur Gupta on 2024-10-17 13:28:17 UTC —

      (In reply to Karthik Sundaravel from comment #29)
      > Hi Madhur,
      >
      > We (Benjamin and myself) have found that disabling DPDK solves the
      > connectivity issue. We would like to understand if in OSP16.2, does the
      > customer use DPDK on any port (need not be mellanox nics) in the affected
      > node ?

      Hi Karthik,

      >We would like to understand if in OSP16.2, does the customer use DPDK on any port (need not be mellanox nics) in the affected node ?

      Yes, the customer has confirmed that with DPDK enabled workloads they faced the issues, but the customer will try to reproduce it with non-dpdk environment.

      However, for the customer dpdk is important for their workload.

      Let me know if you you both need anything else?

      — Additional comment from Madhur Gupta on 2024-10-17 17:03:31 UTC —

      Hello Karthik and Benjamin,

      Here is the response from the customer contact:

      "
      Hey Guys,

      I just had a look on it and indeed we only see the issue on the computes that have also dpdk.

      We have similar computes which don't have dpdk but still the same vf setup, they are not affected by the issue.

      1 caveat to make and it's also mentioned in the case already and I think even one of the engineers metioned it again is.

      That the interfaces that are used for ovs-dpdk are not the same interfaces as the ones used for the vf's and infra vlans.

      They come from completely different network cards.

      Yet for some reason the fact of having dpdk in the host appears to make some difference."

      — Additional comment from Karthik Sundaravel on 2024-10-17 17:47:11 UTC —

      Hi Madhur,

      When DPDK is enabled, all the dpdk capable interfaces would be probed by openvswitch, which appears to be one of the reason why the interfaces not part of the OvS ports are getting impacted.

      We have tried a configuration that could limit the probes [1], which helps solve the issue in our dev setups (Cx5 EX cards). Can the same be verified on the customer's staging environment which has CX6 LX cards.

      Meanwhile we have started a discussion internally, on analysing the side effects of using this [1] configuration.

      [1] ovs-vsctl set o . other_config:dpdk-extra="-a 0000:00:00.0"

      — Additional comment from Karthik Sundaravel on 2024-10-18 04:37:10 UTC —

      Hi Benjamin

      Here are the steps performed on a standalone machine to reproduce the issue on CX5 cards.

      Prerequisites
      ---------------
      RHEL 9.2 (5.14.0-284.66.1.el9_2.x86_64)
      Python 3.9
      Python3-pip
      openstack-network-scripts
      Openvswitch
      systemctl start openvswitch
      systemctl enable openvswitch
      systemctl enable network
      ovs-vsctl set o . other_config:dpdk-init=true
      systemctl restart openvswitch

      Download and install os-net-config
      ----------------------------------
      git clone https://github.com/os-net-config/os-net-config.git -b stable/wallaby
      pip install pyroute2 jsonschema oslo_concurrency
      cd os-net-config
      python setup.py install --prefix=/usr

      Generate the config.yaml
      -----------------------
      Download the config.yaml from the BZ and modify 'CHANGEME' to appropriate nics/vlans/ip address.
      The nic mapping could be found by running 'os-net-config -i'

      Generate the ifcfgs
      --------------------
      os-net-config -c ~/config.yaml -p ifcfg - d

      Test


      Run ping test from one machine to another
      Ping test fails

      Workaround to enable ping
      -------------------------
      Option A:
      ovs-vsctl set o . other_config:dpdk-extra="-a 0000:00:00.0
      systemctl restart openvswitch
      Check if ping works, if not 'systemctl restart network'

      Option B:
      ip link set dev <vf device> promisc on

      Option C: (may not work always)
      ifdown <first member of the bond>

      — Additional comment from Kenny Tordeurs on 2024-10-18 08:57:28 UTC —

      (In reply to Karthik Sundaravel from comment #32)
      > Hi Madhur,
      >
      > When DPDK is enabled, all the dpdk capable interfaces would be probed by
      > openvswitch, which appears to be one of the reason why the interfaces not
      > part of the OvS ports are getting impacted.
      >
      > We have tried a configuration that could limit the probes [1], which helps
      > solve the issue in our dev setups (Cx5 EX cards). Can the same be verified
      > on the customer's staging environment which has CX6 LX cards.
      >
      > Meanwhile we have started a discussion internally, on analysing the side
      > effects of using this [1] configuration.
      >
      > [1] ovs-vsctl set o . other_config:dpdk-extra="-a 0000:00:00.0"

      Hi Karthik, thanks for the workaround which the customer applied but it did require a reboot (not sure if we can simply restart a service instead? )

      Thanks

      — Additional comment from Karthik Sundaravel on 2024-10-18 10:56:01 UTC —

      Can you please perform
      'systemctl restart openvswitch'
      if it still did not help 'systemctl restart network' or a reboot may be required.

      — Additional comment from Karthik Sundaravel on 2024-10-21 10:40:59 UTC —

      Hi Kenny

      Can you please confirm it the suggested workaround solved the connectivity issue on the linux bond (NIC partitioned)

      Regards
      Karthik S

      — Additional comment from Karthik Sundaravel on 2024-10-21 12:01:56 UTC —

      Hi Miguel / Eran

      In [1], we have prepared the steps to apply the workaround for this BZ. We need to test the workaround in few scenarios for functionality and performance
      a) NIC Partitioning on Mellanox nics + DPDK on mellanox nics
      b) NIC Partitioning on Mellanox NICS + DPDK on Intel nics
      c) DPDK on Intel NICs (where nics are bound with vfio-pci)
      d) DPDK on Mellanox Nics

      [1] https://docs.google.com/document/d/1hCwSnCFtBjdBvGSSG71SMUyYWFCZM-90UXOhMb2an38/edit?usp=sharing

      — Additional comment from Kenny Tordeurs on 2024-10-21 14:30:40 UTC —

      (In reply to Karthik Sundaravel from comment #36)
      > Hi Kenny
      >
      > Can you please confirm it the suggested workaround solved the connectivity
      > issue on the linux bond (NIC partitioned)
      >
      > Regards
      > Karthik S

      Yes "ovs-vsctl set o . other_config:dpdk-extra="-a 0000:00:00.0"" does solve the issue.
      Once applied the promisc mode is not needed anymore.

      BUT the systemctl restart openvswitch and systemctl restart network was not enough to make it work, a reboot was needed to get it working.

      I'm only wondering if openswitch service is the correct one to restart, if you look at the currently running ovs related services:
      [root@oscar05com002 ~]# systemctl status ovs|grep service
      _ ovsdb-server.service - Open vSwitch Database Unit
      _ ovs-vswitchd.service - Open vSwitch Forwarding Unit
      _ ovs-delete-transient-ports.service - Open vSwitch Delete Transient Ports

      [root@oscar05com002 ~]# systemctl status vswitch|grep service
      _ openvswitch.service - Open vSwitch

      Additional questions:

      • How can we enable this by default or is this only a workaround?
      • Will this fix survive a RHEL9 leapp upgrade ?

      — Additional comment from Karthik Sundaravel on 2024-10-21 16:11:02 UTC —

      Hi Kenny

      We are in the process of getting this configuration from Tripleo deployment [1], which should take care of the reboots in new nodes.
      Before we suggest this workaround we need it to be verified for regression and performance.
      I think this being a ovs-db change should retain the values after leap upgrade. However it could be verified as well.

      [1] https://docs.google.com/document/d/1hCwSnCFtBjdBvGSSG71SMUyYWFCZM-90UXOhMb2an38/edit?usp=sharing

      — Additional comment from Nate Johnston on 2024-10-24 12:37:40 UTC —

      Viji has a patch up to allow ovs extra options to be configured in template to fix this. Requested blocker since this is needed for the RHOSPPRIO Belgacom escalation.

      — Additional comment from RHEL Program Management on 2024-10-24 12:37:50 UTC —

      This bugzilla has been removed from the release since it does not have an acked release flag. For details, see https://mojo.redhat.com/docs/DOC-1144661#jive_content_id_OSP_Release_Planning.'

      — Additional comment from Madhur Gupta on 2024-10-24 12:42:39 UTC —

      (In reply to Karthik Sundaravel from comment #39)
      > Hi Kenny
      >
      > We are in the process of getting this configuration from Tripleo deployment
      > [1], which should take care of the reboots in new nodes.
      > Before we suggest this workaround we need it to be verified for regression
      > and performance.
      > I think this being a ovs-db change should retain the values after leap
      > upgrade. However it could be verified as well.
      >
      >
      > [1]
      > https://docs.google.com/document/d/1hCwSnCFtBjdBvGSSG71SMUyYWFCZM-
      > 90UXOhMb2an38/edit?usp=sharing

      Hello Team,

      The customer has done some extra testing:

      When you run 'ovs-vsctl set o . other_config:dpdk-extra="-a 0000:00:00.0"' you get the following config:
      [root@oscar05com002 tripleo-admin]# ovs-vsctl list Open_vSwitch|grep other_config
      other_config :

      {dpdk-extra="-a 0000:00:00.0", dpdk-init="true", dpdk-socket-limit="4096", dpdk-socket-mem="4096", ovn-chassis-idx-b2204b60-253b-4654-b0e2-2460839a7402="", pmd-cpu-mask="1c0000000000000000000000000000001c", vhost-postcopy-support="true", vlan-limit="0"}

      they tried with restarting a high number of services, but nothing found yet that makes the networking work without doing a reboot.

      Indeed as Dave pointed out when you run a deploy again, your extra config gets erased and you get reverted to:
      [root@oscar05com002 ~]# ovs-vsctl list Open_vSwitch|grep other_config
      other_config :

      {dpdk-extra=" -n 12", dpdk-init="true", dpdk-socket-limit="4096", dpdk-socket-mem="4096", ovn-chassis-idx-b2204b60-253b-4654-b0e2-2460839a7402="", pmd-cpu-mask="1c0000000000000000000000000000001c", vhost-postcopy-support="true", vlan-limit="0"}

      Also in this direction all network connectivity keeps working fine until you reboot the host.
      Ones rebooted indeed all connectivity will be lost again.

      All in all i didn't found a negative effect for this extra config yet.
      Except the fact that it's not permanent and can't get applied on the fly for the moment.

      — Additional comment from Karthik Sundaravel on 2024-10-24 15:08:07 UTC —

      Hi Madhur,

      Thanks for the inputs. We are planning to get this workaround as part of the deployment itself by exposing the internal parameter of tripleo-ansible ```tripleo_ovs_dpdk_extra``` via THT parameters. So that should take care of the updates.

      Regards
      Karthik S

      — Additional comment from RHEL Program Management on 2024-10-25 16:27:40 UTC —

      This item has been properly Triaged and planned for the release, and Target Release is now set to match the release flag.

      — Additional comment from Mike Burns on 2024-10-25 16:28:57 UTC —

      TRAC approved blocker https://issues.redhat.com/browse/OSP-33001

      — Additional comment from Benjamin Poirier on 2024-10-25 22:02:49 UTC —

      By using Karthik's instructions, I was able to reproduce the problem at
      Nvidia. I was also able to simplify the instructions so that os-net-config is
      not needed:

      Prepare host 1:
      subscription-manager repos --enable fast-datapath-for-rhel-9-x86_64-rpms
      dnf install --allowerasing -y openvswitch3.3

      grubby --update-kernel ALL --args="hugepages=512"
      grub2-mkconfig -o /boot/grub2/grub.cfg

      systemctl start openvswitch.service
      ovs-vsctl set o . other_config:dpdk-init=true

      reboot

      Prepare host 2:
      ip link set dev eth2 up
      ip addr add 192.168.1.2/24 dev eth2

      Reproduce problem on host 1:
      echo 1 > /sys/class/net/eth2/device/sriov_numvfs
      systemctl start openvswitch.service
      ip link set dev eth4 up # eth4 is the new vf netdev
      ip addr add 192.168.1.1/24 dev eth4

      From host 2, ping 192.168.1.1. Does not work, rx_steer_missed_packets
      increases.

      As we can see, vlan and bond are not needed to reproduce the problem.

      Also, if we change the reproduction command sequence to:
      systemctl start openvswitch.service
      echo 1 > /sys/class/net/eth2/device/sriov_numvfs
      ip link set dev eth4 up
      ip addr add 192.168.1.1/24 dev eth4

      The result is good. So the problem seems related to something that ovs
      configures at startup.

      > I tried to dump the steering rules on the adapter using 'mlxdump fsdump' but
      > it did not work. I opened a ticket for this at Nvidia (RM-4124320).

      It did not work because a special license is needed. I was able to run the
      tool on Nvidia systems. In both the bad and good cases above, the steering
      rules are almost the same. The only difference is related to the vf mac
      address which changes each time the vf is created. So this did not provide an
      insight on why traffic is dropped. I asked my coworkers for advice on how to
      get more info on why the rx_steer_missed_packets counter is increasing but
      didn't get any reply. Note that many of them are on vacation.

      Meanwhile, I also tried different ovs package versions on RHEL-9 and noticed
      that the problem also reproduces with openvswitch3.1 but not with
      openvswitch3.0.

      I reproduced the issue using upstream ovs and dpdk releases and, after testing
      various combinations, narrowed it down to the following two:

      • openvswitch-3.0.7 dpdk-21.11.8
        good
      • openvswitch-3.0.7 dpdk-22.03
        bad

      I then bisected on the dpdk repository which identified the following commit:
      87af0d1e1bcc15ca414060263091a0f880ad3a86 is the first bad commit
      commit 87af0d1e1bcc15ca414060263091a0f880ad3a86
      Author: Michael Baum <michaelba@nvidia.com>
      Date: Mon Feb 14 11:35:06 2022 +0200

      net/mlx5: concentrate all device configurations

      Move all device configure to be performed by mlx5_os_cap_config()
      function instead of the spawn function.
      In addition move all relevant fields from mlx5_dev_config structure to
      mlx5_dev_cap.

      Signed-off-by: Michael Baum <michaelba@nvidia.com>
      Acked-by: Matan Azrad <matan@nvidia.com>

      I will contact the respective developers.

      — Additional comment from Eran Kuris on 2024-10-27 07:20:08 UTC —

      (In reply to Greg Rakauskas from comment #28)
      > Hi Eran,
      >
      > Will this BZ be verified for 17.1.4?
      >
      > We need to know whether to add this BZ to the RHOSP 17.1.4 Release Notes.
      >
      > Thanks,
      > --Greg R.

      Hi Greg,
      It depends on the NVIdia as you can see in the above comments.
      Maybe we will be able to provide any WA till we have official fix.

      — Additional comment from errata-xmlrpc on 2024-11-01 18:34:46 UTC —

      Bug report changed to ON_QA status by Errata System.
      A QE request has been submitted for advisory RHSA-2024:138124-01
      https://errata.engineering.redhat.com/advisory/138124

      — Additional comment from errata-xmlrpc on 2024-11-01 18:34:55 UTC —

      This bug has been added to advisory RHSA-2024:138124 by Jason Joyce (jjoyce@redhat.com)

      — Additional comment from Benjamin Poirier on 2024-11-04 13:53:52 UTC —

      > I will contact the respective developers.

      I explained the issue to Michael Baum last week. He later said that he reviewed
      the commit and did not find a problem.

      We (Inbox team) are still trying to get help from someone who is familiar with
      OVS and/or dpdk.

      — Additional comment from Kenny Tordeurs on 2024-11-04 15:11:05 UTC —

      Adding the following information here:

      Updating the firmware to version 26.41.1000 resolved the issue based on https://access.redhat.com/solutions/7063133

      — Additional comment from Kenny Tordeurs on 2024-11-04 15:24:23 UTC —

      (In reply to Kenny Tordeurs from comment #51)
      > Adding the following information here:
      >
      > Updating the firmware to version 26.41.1000 resolved the issue based on
      > https://access.redhat.com/solutions/7063133

      Firmware update only fixed the issue around flooding of the stp packets, sorry for the confusion.

              rh-ee-bpoirier Benjamin Poirier
              jira-bugzilla-migration RH Bugzilla Integration
              Eran Kuris Eran Kuris
              rhos-dfg-nfv
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: