Loading...

Type: Bug
Resolution: Won't Do
Priority: Normal
Fix Version/s: rhos-17.1.z
Affects Version/s: None
Component/s: internal
Labels:
- Triaged

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Bugzilla Bug:
RHBZ: 2323844
AssignedTeam:
rhos-connectivity-nfv
Regression:
None
Intelligence Requested:
Market:

Severity:
Important

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

This BZ is cloned for tracking the fix from openvswitch, while the original BZ#2310427 will be used for the proposed workaround in tripleo.

+++ This bug was initially created as a clone of Bug #2310427 +++

Description of problem:

If we setup two nic-partitioning to run bond0 with all infra vlans on top of 2 vf's.
Network configuration is as follows:
┌───────┐
│ ┼─────┐
│ PF │ │
└───────┘ ┌────┐┌───┐
Connectx-6 LX NIC#1 │ VF ││ │
└────┘│ │ ┌────────┐
│ bond0 (mode=1)──│vlan39XX├─ 192.168.2.X
Connectx-6 LX NIC#2 ┌────┐│ │ └────────┘
┌───────┐ │ VF ││ │
│ PF │ ├────┘└───┘
│ ┼─────┘
└───────┘
This setup works fine on osp16.2 / RHEL8.4.
On osp17.1/RHEL9.2 this works only if the VF is in promisc mode:
10: p1p1_0: <BROADCAST,MULTICAST,PROMISC,SLAVE,UP,LOWER_UP> mtu 9050 qdisc mq master bond0 state UP group default qlen 1000

In the NFV docs it's mentioned that you can indeed put the vf in promiscuous mode.
But it doesn't specify if it's a must do for being able to run your infra vlans on top of it.

Version-Release number of selected component (if applicable):
[redhat-release] Red Hat Enterprise Linux release 9.2 (Plow)
[rhosp-release] Red Hat OpenStack Platform release 17.1.3 (Wallaby)
openvswitch3.1-3.1.0-104.el9fdp.x86_64

How reproducible:
Every time.

Steps to Reproduce:

OSP17.1 environments that have been upgraded from OSP16.2

Actual results:
Works only

if set the vf in promisc mode
disable openvswitch
. systemctl ENABLE tripleo*
. systemctl disable openvswitch.service
. moved away /usr/lib/systemd/system/ovsdb-server.service
. moved away /usr/lib/systemd/system/ovs-delete-transient-ports.service
. moved away /usr/lib/systemd/system/ovs-vswitchd.service

Expected results:
Should work as on 16.2 with exact same configuration

Additional info:

The problem reproduces as well if the VLAN is configured on top of the VF (without bonding)
Connect-6 LX NIC#1
┌───────┐ ┌────┐ ┌────────┐
│ PF ┼──────┼ VF ├────│vlan39xx├── 192.168.2.x
└───────┘ └────┘ └────────┘

If the VLAN is configured on top of the PF interface, everything works and no promisc mode is needed.
Connect-6 LX NIC#1
┌───────┐ ┌────────┐
│ PF ┼─────│vlan39xx├── 192.168.2.x
└───────┘ └────────┘
tried firmware for osp17:

26.41.1002
26.39.1002
26.38.1002
26.36.1010

— Additional comment from Luigi Tamagnone on 2024-09-06 15:00:57 UTC —

sos report of oscar23com089 which was deployed under osp17(not working without promisc)
sos report of oscar22com240 running on osp16.2 (working perfectly fine)

Detailed description from the customer in c#66:
"""
So openstack the openstack installation exists in 2 parts, the OS
installation + network deployment and the OSP installation.

1. OS + network which gets done with the command: openstack overcloud node provision

After this phase the network is working fine in NON promisious mode
(sosreport-oscar23com088-2024-08-20-dlpydhr.tar.xz)
After a reboot the network is still working fine in NON promisious mode
(sosreport-oscar23com088-2024-08-20-wicnavi.tar.xz)

This proofs to me that there is no issue with the image or actual way of
how we do the bonding or any firmware thing, as in this phase you can
still reboot as much as you like and it stays working.

2. OSP deployment which gets done with the command: openstack overcloud deploy

After this phase the network is working fine in NON promisious mode
(sosreport-oscar23com088-2024-08-22-pzzzzrp.tar.xz)
After a reboot the network is broken in NON promisious mode
(sosreport-oscar23com088-2024-08-22-jrmuqfi.tar.xz)

This show that some of the settings that the OSP deploy put in place and
take effect only after the reboot break the networking in NON promisious
mode.
"""

— Additional comment from Ella Shulman on 2024-09-08 11:43:39 UTC —

Hi, can you please specify what is refereed in this ticket as the infra network? also sharing the templates would help a lot in understanding and reproducing the issue

— Additional comment from Luigi Tamagnone on 2024-09-09 08:35:59 UTC —

> can you please specify what is refereed in this ticket as the infra network?
To the bond0 are attached management, storage and tenant vlan network.

The templates are on case 03890610 there are the files and the bash file used for the deploy in templates-osp17.tar.gz

— Additional comment from Benjamin Poirier on 2024-09-10 14:45:12 UTC —

I passed on the information from this ticket to Maor Dickman from Nvidia. He thinks this issue is not related to OVS and he asked:
> Did you tried to reproduce with simple OVS configuration? Or Legacy SRIOV?

— Additional comment from Ella Shulman on 2024-09-10 15:21:07 UTC —

Hi Luigi

I took a deeper look into the case and the reason it is failing is that you can not co-allocate the tenant network like this when using VFs please use a separate NIC for the tenant network. I'll add a request for additional doc text on this.

BR

— Additional comment from Luigi Tamagnone on 2024-09-11 12:39:14 UTC —

I think there was a misunderstanding between dpdkbond0[1][2] and bond0[3][4].
the tenant network is on dpdkbond0 and some vlan are on bond0.

The issue is on bond0

[1] osp16
{
"addresses": [

{ "ip_netmask": "192.168.32.242/22" }

],
"members": [
{
"members": [
{
"driver": "mlx5_core",
"members": [

{ "name": "p2p1", "type": "interface" }

],
"mtu": 9050,
"name": "dpdk0",
"type": "ovs_dpdk_port"
},
{
"driver": "mlx5_core",
"members": [

{ "name": "p2p2", "type": "interface" }

],
"mtu": 9050,
"name": "dpdk1",
"type": "ovs_dpdk_port"
}
],
"mtu": 9050,
"name": "dpdkbond0",
"ovs_options": "bond_mode=balance-slb lacp=active other_config:lacp-time=fast",
"rx_queue": 4,
"type": "ovs_dpdk_bond"
}
],
"name": "br-ex",
"ovs_extra": [
"set port br-ex tag=3955"
],
"type": "ovs_user_bridge",
"use_dhcp": false
},
[2] osp17

addresses:
ip_netmask: 192.168.1.208/24
members:
members:
driver: mlx5_core
members:
name: p2p1
type: interface
mtu: 9050
name: dpdk0
type: ovs_dpdk_port
driver: mlx5_core
members:
name: p2p2
type: interface
mtu: 9050
name: dpdk1
type: ovs_dpdk_port
mtu: 9050
name: dpdkbond0
ovs_options: bond_mode=balance-slb lacp=active other_config:lacp-time=fast
rx_queue: 1
type: ovs_dpdk_bond
name: br-ex
ovs_extra: set port br-ex tag=3955
type: ovs_user_bridge
use_dhcp: false
[3] osp16

Unknown macro: { "bonding_options"}

,

Unknown macro: { "addresses"}

,

[4] osp17

bonding_options: miimon=100 mode=1
dns_servers: ['10.34.255.252', '10.34.255.253']
domain: []
members:
device: p1p1
type: sriov_vf
vfid: 0
promisc: false
device: p1p2
type: sriov_vf
vfid: 0
promisc: false
mtu: 9050
name: bond0
type: linux_bond
use_dhcp: false
addresses:
ip_netmask: 192.168.2.67/24
device: bond0
mtu: 9000
type: vlan
vlan_id: 3951

— Additional comment from Karthik Sundaravel on 2024-09-18 16:02:12 UTC —

Hi Luigi

Can we add `primary` field in the bond and check the behaviour

bonding_options: miimon=100 mode=1
dns_servers: ['10.34.255.252', '10.34.255.253']
domain: []
members:
device: p1p1
type: sriov_vf
vfid: 0
promisc: false
primary: true
device: p1p2
type: sriov_vf
vfid: 0
promisc: false
mtu: 9050
name: bond0
type: linux_bond
use_dhcp: false
addresses:
ip_netmask: 192.168.2.67/24
device: bond0
mtu: 9000
type: vlan
vlan_id: 3951

— Additional comment from Luigi Tamagnone on 2024-09-19 10:06:37 UTC —

@Karthik Unfortunately, there is no change in behaviour, Cu still doesn't have network connectivity.

— Additional comment from Benjamin Poirier on 2024-09-20 14:40:51 UTC —

I tried a few different ways based on the ascii art diagrams and the problem
did not reproduce. For instance, I tried the following:

devlink dev eswitch set pci/0000:08:00.0 mode switchdev
echo 1 > /sys/bus/pci/devices/0000:08:00.0/sriov_numvfs
udevadm settle
ip link add br0 up type bridge
ip link set dev eth2 up master br0 # PF
ip link set dev eth4 up master br0 # VF PR
ip link set dev eth5 up # actual VF
ip addr add 192.168.1.1/24 dev eth5
ping -c4 192.168.1.2 # ok
ip link add eth5.39 link eth5 up type vlan id 39
ip addr add 192.168.2.1/24 dev eth5.39
ping -c4 192.168.2.2 # ok
systemctl start openvswitch.service
ip link show dev eth5 # no "PROMISC" flag
ping -c4 192.168.2.2 # ok

In the above, I used kernel 5.14.0-284.30.1.el9_2.x86_64, adapter CX-6 Lx with
firmware 26.41.1000.

Presumably, more specific openvswitch configuration is needed to reproduce the
problem but I can't guess what it is, especially given that I have next to no
experience with OVS.

Can you to try to simplify the reproduction environment (ie. without OSP)
and provide detailed reproduction instructions?

— Additional comment from Karthik Sundaravel on 2024-09-23 09:27:37 UTC —

Hi Benjamin,

I'll try to make a simplified reproducer without OSP.
The issue is seen with legacy SR-IOV and not switchdev.

— Additional comment from Karthik Sundaravel on 2024-09-25 12:22:24 UTC —

Hi Luigi,

can you please share the model of NIC - CX6 or CX5 ?

Karthik

— Additional comment from Luigi Tamagnone on 2024-09-25 13:07:22 UTC —

It should be
3 dual-port (6) Mellanox Technologies MT2894 Family [ConnectX-6 Lx] [15b3:101f]

— Additional comment from on 2024-09-25 15:13:19 UTC —

Team,

I am enabling escalation flag on this bug as case assigned to it was escalated by TAM of Belgacom customer.

This issue will be a big problem when they will upgrade their telco cloud cluster 2 which is planned for 1November - this still allows us some time BUT we have been going on about 2 months with this case and have not yet come to a conclusion hence they would like to re-engage our attention on this issue now that the urgent problems have been resolved post upgrade. We'd really need to know what's going on and how we should proceed as we estimate that almost 40 servers of the telco cluster 2 will be impacted

If bug could be prioritized that will be appreciated,

Regards,

Joanna

Senior Escalation Manager

— Additional comment from Nate Johnston on 2024-09-26 12:49:36 UTC —

@jfindysz@redhat.com If this is a priority engineering escalation please follow the RHOS Prio escalation steps at https://spaces.redhat.com/display/RHOSPRIO/RHOSP+Priority+List+%28v2.0%29+Workflow for proper engineering engagement at an escalated level.

— Additional comment from Karthik Sundaravel on 2024-09-26 17:55:43 UTC —

— Additional comment from Karthik Sundaravel on 2024-09-26 18:02:19 UTC —

Steps to reproduce
------------------
STEP 1) download the config file. Please modify the entries tagged with " => CHANGE ME"
STEP 2)
download os-net-config from the git repo https://github.com/os-net-config/os-net-config.git
cd os-net-config; git fetch -v --all; git switch -c stable/wallaby origin/stable/wallaby
python setup.py install --prefix=/usr
os-net-config -d -c <path to the config file>
Dependencies:
Python 3.7.0 or higher is required. Other modules could be installed via pip

STEP 3) repeat the above steps on second machine with different ip address

STEP 4) Ping from one machine to another. It works now.

STEP 5) Reboot one machine. Ping doesn't work.

STEP 6) On the rebooted machine, do
ip link set dev <device name for VF-id> promisc on => repeat this for second interface as well.
Ping works now.

or

ip link set dev <device name for VF-id> down => Ping works in my setup

Kernel version
[tripleo-admin@compute-0 ~]$ uname -r
5.14.0-284.82.1.el9_2.x86_64

Driver/FW version:
[tripleo-admin@compute-0 ~]$ ethtool -i ens2f0np0
driver: mlx5_core
version: 5.14.0-284.82.1.el9_2.x86_64
firmware-version: 26.36.1010 (MT_0000000532)
expansion-rom-version:
bus-info: 0000:17:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

Device: Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx]

I did the above steps to reproduce the issue.

— Additional comment from Benjamin Poirier on 2024-09-27 21:37:07 UTC —

I followed the instructions in comment 16 but faced a few errors and
ultimately there was no "mellanox_bond" interface.

I used an up to date RHEL-9.2 install. Here are the commands that I ran:

git clone https://github.com/os-net-config/os-net-config.git
cd os-net-config/
git fetch -v --all
git switch -c stable/wallaby origin/stable/wallaby
python setup.py install --prefix=/usr
os-net-config -d -c ~/config_mellanox_no_promisc.yaml
pip install oslo_concurrency
os-net-config -d -c ~/config_mellanox_no_promisc.yaml
pip install pyudev
os-net-config -d -c ~/config_mellanox_no_promisc.yaml
pip install jsonschema
os-net-config -d -c ~/config_mellanox_no_promisc.yaml
[...]
NoneType: None
Traceback (most recent call last):
File "/usr/bin/os-net-config", line 10, in <module>
sys.exit(main())
File "/usr/lib/python3.9/site-packages/os_net_config/cli.py", line 360, in main
pf_files_changed = provider.apply(cleanup=opts.cleanup,
File "/usr/lib/python3.9/site-packages/os_net_config/impl_ifcfg.py", line 2020, in apply
self.ifdown(interface)
File "/usr/lib/python3.9/site-packages/os_net_config/_init_.py", line 500, in ifdown
self.execute(msg, '/sbin/ifdown', interface, check_exit_code=False)
File "/usr/lib/python3.9/site-packages/os_net_config/_init_.py", line 480, in execute
out, err = processutils.execute(cmd, *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/oslo_concurrency/processutils.py", line 401, in execute
obj = subprocess.Popen(cmd,
File "/usr/lib64/python3.9/subprocess.py", line 951, in _init_
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib64/python3.9/subprocess.py", line 1821, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/sbin/ifdown'
dnf install -y NetworkManager-initscripts-updown
os-net-config -d -c ~/config_mellanox_no_promisc.yaml
[...]
2024-09-28 00:14:00.203 INFO os_net_config.execute running ifup on interface: enp8s0f0v1
2024-09-28 00:14:00.394 INFO os_net_config.execute running ifup on interface: enp8s0f1v1
2024-09-28 00:14:00.582 INFO os_net_config.execute running ifup on interface: mellanox_bond
2024-09-28 00:14:00.612 ERROR os_net_config.impl_ifcfg.apply Failure(s) occurred when applying configuration
2024-09-28 00:14:00.612 ERROR os_net_config.impl_ifcfg.apply stdout: , stderr: Error: unknown connection '/etc/sysconfig/network-scripts/ifcfg-enp8s0f0v1'.
Failure to activate file "enp8s0f0v1"!

See all profiles with `nmcli connection`.
Reload files from disk with `nmcli connection reload`
Activate the desired profile with `nmcli connection up \"$NAME\"`

2024-09-28 00:14:00.612 ERROR os_net_config.impl_ifcfg.apply stdout: , stderr: Error: unknown connection '/etc/sysconfig/network-scripts/ifcfg-enp8s0f1v1'.
Failure to activate file "enp8s0f1v1"!

See all profiles with `nmcli connection`.
Reload files from disk with `nmcli connection reload`
Activate the desired profile with `nmcli connection up \"$NAME\"`

2024-09-28 00:14:00.612 ERROR os_net_config.impl_ifcfg.apply stdout: , stderr: Error: unknown connection '/etc/sysconfig/network-scripts/ifcfg-mellanox_bond'.
Failure to activate file "mellanox_bond"!

See all profiles with `nmcli connection`.
Reload files from disk with `nmcli connection reload`
Activate the desired profile with `nmcli connection up \"$NAME\"`

2024-09-28 00:14:00.612 ERROR os_net_config.main **Failed to configure with ifcfg provider**
ConfigurationError('Failure(s) occurred when applying configuration')
2024-09-28 00:14:00.612 ERROR os_net_config.common.log_exceptions Traceback (most recent call last):
File "/usr/bin/os-net-config", line 10, in <module>
sys.exit(main())
File "/usr/lib/python3.9/site-packages/os_net_config/cli.py", line 392, in main
files_changed = provider.apply(cleanup=opts.cleanup,
File "/usr/lib/python3.9/site-packages/os_net_config/impl_ifcfg.py", line 2147, in apply
raise os_net_config.ConfigurationError(message)
os_net_config.ConfigurationError: Failure(s) occurred when applying configuration
NoneType: None
Traceback (most recent call last):
File "/usr/bin/os-net-config", line 10, in <module>
sys.exit(main())
File "/usr/lib/python3.9/site-packages/os_net_config/cli.py", line 392, in main
files_changed = provider.apply(cleanup=opts.cleanup,
File "/usr/lib/python3.9/site-packages/os_net_config/impl_ifcfg.py", line 2147, in apply
raise os_net_config.ConfigurationError(message)
os_net_config.ConfigurationError: Failure(s) occurred when applying configuration

ls /etc/sysconfig/network-scripts/
ifcfg-enp8s0f0np0 ifcfg-enp8s0f0v1 ifcfg-enp8s0f1np1 ifcfg-enp8s0f1v1 ifcfg-mellanox_bond readme-ifcfg-rh.txt
nmcli con
NAME UUID TYPE DEVICE
enp5s0 bbb03040-9469-4436-9537-4e6ecafadeff ethernet enp5s0
enp4s0 d05673ca-6f4f-44be-ae6b-353b18a83f1d ethernet enp4s0
lo 205e4428-2079-4e7a-89da-4cb811c0ce8d loopback lo
System enp8s0f0np0 8cfe20f3-2c47-a269-16cf-ed6e17919c74 ethernet enp8s0f0np0
System enp8s0f1np1 a3c65d4a-a91d-7bd5-63bd-f6f55fd22cc8 ethernet enp8s0f1np1

All of the ifcfg-* files under /etc/sysconfig/network-scripts/ were created by
os-net-config but NetworkManager only loads ifcfg-enp8s0f0np0 and
ifcfg-enp8s0f1np1. I noticed this difference:

grep NM_CONTROLLED ifcfg-*
ifcfg-enp8s0f0np0:NM_CONTROLLED=yes
ifcfg-enp8s0f0v1:NM_CONTROLLED=no
ifcfg-enp8s0f1np1:NM_CONTROLLED=yes
ifcfg-enp8s0f1v1:NM_CONTROLLED=no
ifcfg-mellanox_bond:NM_CONTROLLED=no

So it seems expected that NetworkManager will not load some of those files.

Do the files have a similar content when you follow the instructions. Does
NetworkManager load them?

Since you did not mention installing NetworkManager-initscripts-updown, is it
expected that I ran into the first quoted error (FileNotFoundError) before
installing that package?

Let me know if you have some additionnal suggestions.

— Additional comment from Karthik Sundaravel on 2024-09-28 10:30:41 UTC —

Hi Benjamin,

In OSP, we use the package openstack-network-scripts (aka initscripts) for the ifup / ifdown commands.
So please remove the package `NetworkManager-initscripts-updown` and install openstack-network-scripts.

I fetched the version of openstack-network-scripts from another system, and it should be more or less same as the one I have used for reproducing the issue.

Name : openstack-network-scripts
Version : 10.11.1
Release : 9.17_1.1.el9ost
Architecture : x86_64
Size : 161 k
Source : openstack-network-scripts-10.11.1-9.17_1.1.el9ost.src.rpm
Repository : @System
From repo : rhos-17.1
Summary : Legacy scripts for manipulating of network devices
URL : https://github.com/fedora-sysv/initscripts
License : GPLv2

— Additional comment from Nate Johnston on 2024-10-01 20:56:55 UTC —

Adding link to RHOS Prio ticket

— Additional comment from Benjamin Poirier on 2024-10-02 20:59:10 UTC —

I installed os-net-config from the rhoso-18.0-for-rhel-9-x86_64-rpms
repository.

> STEP 4) Ping from one machine to another. It works now.

Indeed

> STEP 5) Reboot one machine. Ping doesn't work.

After reboot, the mellanox_bond interface does not exist.
I started the "network" init script (part of os-net-config) manually but it
reported some errors and failed:

Oct 02 23:30:49 c-236-4-240-243 network[2110]: Bringing up interface mellanox_bond:
Oct 02 23:30:49 c-236-4-240-243 network[2477]: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Device enp8s0f0
v1 does not seem to be present, delaying initialization.
Oct 02 23:30:49 c-236-4-240-243 /etc/sysconfig/network-scripts/ifup-eth[2500]: Device enp8s0f0v1 does not seem to be
present, delaying initialization.
Oct 02 23:30:49 c-236-4-240-243 network[2407]: WARN : [/etc/sysconfig/network-scripts/ifup-eth] Unable to start
slave device ifcfg-enp8s0f0v1 for master mellanox_bond.
Oct 02 23:30:49 c-236-4-240-243 /etc/sysconfig/network-scripts/ifup-eth[2501]: Unable to start slave device ifcfg-enp8s0f0v1 for master mellanox_bond.
Oct 02 23:30:49 c-236-4-240-243 network[2502]: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Device enp8s0f1v1 does not seem to be present, delaying initialization.
Oct 02 23:30:49 c-236-4-240-243 /etc/sysconfig/network-scripts/ifup-eth[2525]: Device enp8s0f1v1 does not seem to be present, delaying initialization.
Oct 02 23:30:49 c-236-4-240-243 network[2407]: WARN : [/etc/sysconfig/network-scripts/ifup-eth] Unable to start slave device ifcfg-enp8s0f1v1 for master mellanox_bond.

The VF interfaces are not present. While config_mellanox_no_promisc.yaml
includes a directive to create 4 VFs:

type: sriov_pf
name: nic11 => CHANGE ME
mtu: 9000
numvfs: 4

... this information does not seem to be reflected in the files that were
created under /etc/sysconfig/network-scripts:

root@c-236-4-240-243:/etc/sysconfig/network-scripts# cat ifcfg-enp8s0f0np0

This file is autogenerated by os-net-config
DEVICE=enp8s0f0np0
ONBOOT=yes
HOTPLUG=no
NM_CONTROLLED=yes
PEERDNS=no
BOOTPROTO=none
MTU=9000
DEFROUTE=no
root@c-236-4-240-243:/etc/sysconfig/network-scripts# cat ifcfg-enp8s0f0v1
This file is autogenerated by os-net-config
DEVICE=enp8s0f0v1
ONBOOT=yes
HOTPLUG=no
NM_CONTROLLED=no
PEERDNS=no
MASTER=mellanox_bond
SLAVE=yes
BOOTPROTO=none

So I'm not sure how this is supposed to work.

Did you try the reproduction instructions on RHEL-9.2? How were the interfaces
defined in the yaml file created after boot?

— Additional comment from Benjamin Poirier on 2024-10-02 21:02:02 UTC —

> I installed os-net-config from the rhoso-18.0-for-rhel-9-x86_64-rpms
^
I meant "openstack-network-scripts", sorry.

— Additional comment from Karthik Sundaravel on 2024-10-03 01:54:15 UTC —

os-net-config creates /var/lib/os-net-config/sriov_config.yaml, where the numvfs and other VF configurations are present.
Also os-net-config adds a service file sriov_config.
During reboot, os-net-config sriov_config service will read the sriov_config.yaml and apply the settings.

And then network service brings up the bonds configured in the ifcfg files.

— Additional comment from Benjamin Poirier on 2024-10-04 13:10:22 UTC —

> During reboot, os-net-config sriov_config service will read the sriov_config.yaml and apply the settings.

At the time when I wrote comment 20, "sriov_config.service" was failing and I
didn't notice. It was failing because I had installed os-net-config in a venv
instead of system-wide and the service file doesn't handle that. I installed
it under /usr like the original instructions said, I also enabled
"network.service" and then the network config was applied at boot as expected.

> STEP 5) Reboot one machine. Ping doesn't work.

In my case, now that the network services are starting properly, the problem
does not reproduce; ping works after reboot and the vf interfaces do NOT have
the promisc flag. I had a call with Karthik yesterday and showed him that.

I guess the problem depends on some more specific configuration to reproduce.
Can you please try to narrow it down?

— Additional comment from Madhur Gupta on 2024-10-11 11:15:11 UTC —

Hello ksundara@redhat.com and bpoirier@redhat.com,

Do you need anything from our side or any data from the customer to help expedite the resolution?

As informed earlier, their upgrade is planned for 1st of November and they won't be able to push this date further.

@ njohnston@redhat.com please let us know if we can help with anything? If you need to connect with customer's reproducer then it can be provided as well.

Regards,
Madhur Gupta
TAM for Belgacom

— Additional comment from Karthik Sundaravel on 2024-10-12 01:29:28 UTC —

Benjamin (Partner engineer from Nvidia) is working on the issue. This needs investigation from Nvidia, since the PF/VF configurations applied by os-net-config in both working (OSP16.2) and non working (OSP17.1) are the same, but seeing different behaviour from the SR-IOV nic.

@Madhur
We have reproduced the issue in our development machines and given access to Benjamin to investigate. We have "ConnectX-5 Ex" in our lab, while the customer has seen this issue in "ConnectX-6 Lx". If we could get couple of machines from the customer (where the issue is seen) for Benjamin , it could be helpful as well.

@Benjamin, we have a deadline of 1st November. Please note that we have a high priority and date pressure to have a fix by then.

— Additional comment from Benjamin Poirier on 2024-10-15 16:12:11 UTC —

Karthik provided access to a system at Red Hat where the problem occurs. I
began to investigate the situation on that system. It did not use vlans, it
was just a bond over two VFs. I observed the following:
*)
When the problem occurred, I deleted the bond and assigned the ip address
directly on the VF that was the active bond member. The problem continued, so
might not be related to bond or vlan. In the same way as reported in the
description, after setting that VF to promisc mode, the problem was resolved
(ping worked).
*)
When the problem occurs, `ip -s link` shows that the packet RX counter on the
VF section of the PF netdev increases, but the packet RX counter on the VF
netdev itself does not increase.
`ethtool -S` on the VF shows that the rx_steer_missed_packets counter
increases.

I tried to dump the steering rules on the adapter using 'mlxdump fsdump' but
it did not work. I opened a ticket for this at Nvidia (RM-4124320).
*)
If I do `systemctl disable openvswitch.service` and reboot, the problem does
not occur. However, openvswitch still gets started at boot by network.service.
So there might be different behavior depending on how/when OVS is started.
Moreover, the OVS configuration does not actually include the ConnectX nic
AFAIK. It includes two Intel nics.

Can you try again to provide simple but complete reproduction instructions?

— Additional comment from Karthik Sundaravel on 2024-10-16 14:23:16 UTC —

Benjamin,

I'll try to reproduce the issue on non openstack setup. I'll share the steps when I have one.

Meanwhile, as we speak the ovs bridges were all cleaned up in those machines and we still see some interference between openvswitch and the Mellanox cards.
Does this call for a look up from the openvswitch team ?

— Additional comment from Greg Rakauskas on 2024-10-16 15:57:52 UTC —

Hi Eran,

Will this BZ be verified for 17.1.4?

We need to know whether to add this BZ to the RHOSP 17.1.4 Release Notes.

Thanks,
--Greg R.

— Additional comment from Karthik Sundaravel on 2024-10-16 16:42:17 UTC —

Hi Madhur,

We (Benjamin and myself) have found that disabling DPDK solves the connectivity issue. We would like to understand if in OSP16.2, does the customer use DPDK on any port (need not be mellanox nics) in the affected node ?

— Additional comment from Madhur Gupta on 2024-10-17 13:28:17 UTC —

(In reply to Karthik Sundaravel from comment #29)
> Hi Madhur,
>
> We (Benjamin and myself) have found that disabling DPDK solves the
> connectivity issue. We would like to understand if in OSP16.2, does the
> customer use DPDK on any port (need not be mellanox nics) in the affected
> node ?

Hi Karthik,

>We would like to understand if in OSP16.2, does the customer use DPDK on any port (need not be mellanox nics) in the affected node ?

Yes, the customer has confirmed that with DPDK enabled workloads they faced the issues, but the customer will try to reproduce it with non-dpdk environment.

However, for the customer dpdk is important for their workload.

Let me know if you you both need anything else?

— Additional comment from Madhur Gupta on 2024-10-17 17:03:31 UTC —

Hello Karthik and Benjamin,

Here is the response from the customer contact:

"
Hey Guys,

I just had a look on it and indeed we only see the issue on the computes that have also dpdk.

We have similar computes which don't have dpdk but still the same vf setup, they are not affected by the issue.

1 caveat to make and it's also mentioned in the case already and I think even one of the engineers metioned it again is.

That the interfaces that are used for ovs-dpdk are not the same interfaces as the ones used for the vf's and infra vlans.

They come from completely different network cards.

Yet for some reason the fact of having dpdk in the host appears to make some difference."

— Additional comment from Karthik Sundaravel on 2024-10-17 17:47:11 UTC —

Hi Madhur,

When DPDK is enabled, all the dpdk capable interfaces would be probed by openvswitch, which appears to be one of the reason why the interfaces not part of the OvS ports are getting impacted.

We have tried a configuration that could limit the probes [1], which helps solve the issue in our dev setups (Cx5 EX cards). Can the same be verified on the customer's staging environment which has CX6 LX cards.

Meanwhile we have started a discussion internally, on analysing the side effects of using this [1] configuration.

[1] ovs-vsctl set o . other_config:dpdk-extra="-a 0000:00:00.0"

— Additional comment from Karthik Sundaravel on 2024-10-18 04:37:10 UTC —

Hi Benjamin

Here are the steps performed on a standalone machine to reproduce the issue on CX5 cards.

Prerequisites
---------------
RHEL 9.2 (5.14.0-284.66.1.el9_2.x86_64)
Python 3.9
Python3-pip
openstack-network-scripts
Openvswitch
systemctl start openvswitch
systemctl enable openvswitch
systemctl enable network
ovs-vsctl set o . other_config:dpdk-init=true
systemctl restart openvswitch

Download and install os-net-config
----------------------------------
git clone https://github.com/os-net-config/os-net-config.git -b stable/wallaby
pip install pyroute2 jsonschema oslo_concurrency
cd os-net-config
python setup.py install --prefix=/usr

Generate the config.yaml
-----------------------
Download the config.yaml from the BZ and modify 'CHANGEME' to appropriate nics/vlans/ip address.
The nic mapping could be found by running 'os-net-config -i'

Generate the ifcfgs
--------------------
os-net-config -c ~/config.yaml -p ifcfg - d

Test

Run ping test from one machine to another
Ping test fails

Workaround to enable ping
-------------------------
Option A:
ovs-vsctl set o . other_config:dpdk-extra="-a 0000:00:00.0
systemctl restart openvswitch
Check if ping works, if not 'systemctl restart network'

Option B:
ip link set dev <vf device> promisc on

Option C: (may not work always)
ifdown <first member of the bond>

— Additional comment from Kenny Tordeurs on 2024-10-18 08:57:28 UTC —

(In reply to Karthik Sundaravel from comment #32)
> Hi Madhur,
>
> When DPDK is enabled, all the dpdk capable interfaces would be probed by
> openvswitch, which appears to be one of the reason why the interfaces not
> part of the OvS ports are getting impacted.
>
> We have tried a configuration that could limit the probes [1], which helps
> solve the issue in our dev setups (Cx5 EX cards). Can the same be verified
> on the customer's staging environment which has CX6 LX cards.
>
> Meanwhile we have started a discussion internally, on analysing the side
> effects of using this [1] configuration.
>
> [1] ovs-vsctl set o . other_config:dpdk-extra="-a 0000:00:00.0"

Hi Karthik, thanks for the workaround which the customer applied but it did require a reboot (not sure if we can simply restart a service instead? )

Thanks

— Additional comment from Karthik Sundaravel on 2024-10-18 10:56:01 UTC —

Can you please perform
'systemctl restart openvswitch'
if it still did not help 'systemctl restart network' or a reboot may be required.

— Additional comment from Karthik Sundaravel on 2024-10-21 10:40:59 UTC —

Hi Kenny

Can you please confirm it the suggested workaround solved the connectivity issue on the linux bond (NIC partitioned)

Regards
Karthik S

— Additional comment from Karthik Sundaravel on 2024-10-21 12:01:56 UTC —

Hi Miguel / Eran

In [1], we have prepared the steps to apply the workaround for this BZ. We need to test the workaround in few scenarios for functionality and performance
a) NIC Partitioning on Mellanox nics + DPDK on mellanox nics
b) NIC Partitioning on Mellanox NICS + DPDK on Intel nics
c) DPDK on Intel NICs (where nics are bound with vfio-pci)
d) DPDK on Mellanox Nics

[1] https://docs.google.com/document/d/1hCwSnCFtBjdBvGSSG71SMUyYWFCZM-90UXOhMb2an38/edit?usp=sharing

— Additional comment from Kenny Tordeurs on 2024-10-21 14:30:40 UTC —

(In reply to Karthik Sundaravel from comment #36)
> Hi Kenny
>
> Can you please confirm it the suggested workaround solved the connectivity
> issue on the linux bond (NIC partitioned)
>
> Regards
> Karthik S

Yes "ovs-vsctl set o . other_config:dpdk-extra="-a 0000:00:00.0"" does solve the issue.
Once applied the promisc mode is not needed anymore.

BUT the systemctl restart openvswitch and systemctl restart network was not enough to make it work, a reboot was needed to get it working.

I'm only wondering if openswitch service is the correct one to restart, if you look at the currently running ovs related services:
[root@oscar05com002 ~]# systemctl status ovs|grep service
_ ovsdb-server.service - Open vSwitch Database Unit
_ ovs-vswitchd.service - Open vSwitch Forwarding Unit
_ ovs-delete-transient-ports.service - Open vSwitch Delete Transient Ports

[root@oscar05com002 ~]# systemctl status vswitch|grep service
_ openvswitch.service - Open vSwitch

Additional questions:

How can we enable this by default or is this only a workaround?
Will this fix survive a RHEL9 leapp upgrade ?

— Additional comment from Karthik Sundaravel on 2024-10-21 16:11:02 UTC —

Hi Kenny

We are in the process of getting this configuration from Tripleo deployment [1], which should take care of the reboots in new nodes.
Before we suggest this workaround we need it to be verified for regression and performance.
I think this being a ovs-db change should retain the values after leap upgrade. However it could be verified as well.

[1] https://docs.google.com/document/d/1hCwSnCFtBjdBvGSSG71SMUyYWFCZM-90UXOhMb2an38/edit?usp=sharing

— Additional comment from Nate Johnston on 2024-10-24 12:37:40 UTC —

Viji has a patch up to allow ovs extra options to be configured in template to fix this. Requested blocker since this is needed for the RHOSPPRIO Belgacom escalation.

— Additional comment from RHEL Program Management on 2024-10-24 12:37:50 UTC —

This bugzilla has been removed from the release since it does not have an acked release flag. For details, see https://mojo.redhat.com/docs/DOC-1144661#jive_content_id_OSP_Release_Planning.'

— Additional comment from Madhur Gupta on 2024-10-24 12:42:39 UTC —

(In reply to Karthik Sundaravel from comment #39)
> Hi Kenny
>
> We are in the process of getting this configuration from Tripleo deployment
> [1], which should take care of the reboots in new nodes.
> Before we suggest this workaround we need it to be verified for regression
> and performance.
> I think this being a ovs-db change should retain the values after leap
> upgrade. However it could be verified as well.
>
>
> [1]
> https://docs.google.com/document/d/1hCwSnCFtBjdBvGSSG71SMUyYWFCZM-
> 90UXOhMb2an38/edit?usp=sharing

Hello Team,

The customer has done some extra testing:

When you run 'ovs-vsctl set o . other_config:dpdk-extra="-a 0000:00:00.0"' you get the following config:
[root@oscar05com002 tripleo-admin]# ovs-vsctl list Open_vSwitch|grep other_config
other_config :

{dpdk-extra="-a 0000:00:00.0", dpdk-init="true", dpdk-socket-limit="4096", dpdk-socket-mem="4096", ovn-chassis-idx-b2204b60-253b-4654-b0e2-2460839a7402="", pmd-cpu-mask="1c0000000000000000000000000000001c", vhost-postcopy-support="true", vlan-limit="0"}

they tried with restarting a high number of services, but nothing found yet that makes the networking work without doing a reboot.

Indeed as Dave pointed out when you run a deploy again, your extra config gets erased and you get reverted to:
[root@oscar05com002 ~]# ovs-vsctl list Open_vSwitch|grep other_config
other_config :

{dpdk-extra=" -n 12", dpdk-init="true", dpdk-socket-limit="4096", dpdk-socket-mem="4096", ovn-chassis-idx-b2204b60-253b-4654-b0e2-2460839a7402="", pmd-cpu-mask="1c0000000000000000000000000000001c", vhost-postcopy-support="true", vlan-limit="0"}

Also in this direction all network connectivity keeps working fine until you reboot the host.
Ones rebooted indeed all connectivity will be lost again.

All in all i didn't found a negative effect for this extra config yet.
Except the fact that it's not permanent and can't get applied on the fly for the moment.

— Additional comment from Karthik Sundaravel on 2024-10-24 15:08:07 UTC —

Hi Madhur,

Thanks for the inputs. We are planning to get this workaround as part of the deployment itself by exposing the internal parameter of tripleo-ansible ```tripleo_ovs_dpdk_extra``` via THT parameters. So that should take care of the updates.

Regards
Karthik S

— Additional comment from RHEL Program Management on 2024-10-25 16:27:40 UTC —

This item has been properly Triaged and planned for the release, and Target Release is now set to match the release flag.

— Additional comment from Mike Burns on 2024-10-25 16:28:57 UTC —

TRAC approved blocker https://issues.redhat.com/browse/OSP-33001

— Additional comment from Benjamin Poirier on 2024-10-25 22:02:49 UTC —

By using Karthik's instructions, I was able to reproduce the problem at
Nvidia. I was also able to simplify the instructions so that os-net-config is
not needed:

Prepare host 1:
subscription-manager repos --enable fast-datapath-for-rhel-9-x86_64-rpms
dnf install --allowerasing -y openvswitch3.3

grubby --update-kernel ALL --args="hugepages=512"
grub2-mkconfig -o /boot/grub2/grub.cfg

systemctl start openvswitch.service
ovs-vsctl set o . other_config:dpdk-init=true

reboot

Prepare host 2:
ip link set dev eth2 up
ip addr add 192.168.1.2/24 dev eth2

Reproduce problem on host 1:
echo 1 > /sys/class/net/eth2/device/sriov_numvfs
systemctl start openvswitch.service
ip link set dev eth4 up # eth4 is the new vf netdev
ip addr add 192.168.1.1/24 dev eth4

From host 2, ping 192.168.1.1. Does not work, rx_steer_missed_packets
increases.

As we can see, vlan and bond are not needed to reproduce the problem.

Also, if we change the reproduction command sequence to:
systemctl start openvswitch.service
echo 1 > /sys/class/net/eth2/device/sriov_numvfs
ip link set dev eth4 up
ip addr add 192.168.1.1/24 dev eth4

The result is good. So the problem seems related to something that ovs
configures at startup.

> I tried to dump the steering rules on the adapter using 'mlxdump fsdump' but
> it did not work. I opened a ticket for this at Nvidia (RM-4124320).

It did not work because a special license is needed. I was able to run the
tool on Nvidia systems. In both the bad and good cases above, the steering
rules are almost the same. The only difference is related to the vf mac
address which changes each time the vf is created. So this did not provide an
insight on why traffic is dropped. I asked my coworkers for advice on how to
get more info on why the rx_steer_missed_packets counter is increasing but
didn't get any reply. Note that many of them are on vacation.

Meanwhile, I also tried different ovs package versions on RHEL-9 and noticed
that the problem also reproduces with openvswitch3.1 but not with
openvswitch3.0.

I reproduced the issue using upstream ovs and dpdk releases and, after testing
various combinations, narrowed it down to the following two:

openvswitch-3.0.7 dpdk-21.11.8
good
openvswitch-3.0.7 dpdk-22.03
bad

I then bisected on the dpdk repository which identified the following commit:
87af0d1e1bcc15ca414060263091a0f880ad3a86 is the first bad commit
commit 87af0d1e1bcc15ca414060263091a0f880ad3a86
Author: Michael Baum <michaelba@nvidia.com>
Date: Mon Feb 14 11:35:06 2022 +0200

net/mlx5: concentrate all device configurations

Move all device configure to be performed by mlx5_os_cap_config()
function instead of the spawn function.
In addition move all relevant fields from mlx5_dev_config structure to
mlx5_dev_cap.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

I will contact the respective developers.

— Additional comment from Eran Kuris on 2024-10-27 07:20:08 UTC —

(In reply to Greg Rakauskas from comment #28)
> Hi Eran,
>
> Will this BZ be verified for 17.1.4?
>
> We need to know whether to add this BZ to the RHOSP 17.1.4 Release Notes.
>
> Thanks,
> --Greg R.

Hi Greg,
It depends on the NVIdia as you can see in the above comments.
Maybe we will be able to provide any WA till we have official fix.

— Additional comment from errata-xmlrpc on 2024-11-01 18:34:46 UTC —

Bug report changed to ON_QA status by Errata System.
A QE request has been submitted for advisory RHSA-2024:138124-01
https://errata.engineering.redhat.com/advisory/138124

— Additional comment from errata-xmlrpc on 2024-11-01 18:34:55 UTC —

This bug has been added to advisory RHSA-2024:138124 by Jason Joyce (jjoyce@redhat.com)

— Additional comment from Benjamin Poirier on 2024-11-04 13:53:52 UTC —

> I will contact the respective developers.

I explained the issue to Michael Baum last week. He later said that he reviewed
the commit and did not find a problem.

We (Inbox team) are still trying to get help from someone who is familiar with
OVS and/or dpdk.

— Additional comment from Kenny Tordeurs on 2024-11-04 15:11:05 UTC —

Adding the following information here:

Updating the firmware to version 26.41.1000 resolved the issue based on https://access.redhat.com/solutions/7063133

— Additional comment from Kenny Tordeurs on 2024-11-04 15:24:23 UTC —

(In reply to Kenny Tordeurs from comment #51)
> Adding the following information here:
>
> Updating the firmware to version 26.41.1000 resolved the issue based on
> https://access.redhat.com/solutions/7063133

Firmware update only fixed the issue around flooding of the stp packets, sorry for the confusion.

external trackers

Red Hat Customer Portal 03890610

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty