[OCPBUGS-20353] NetworkManager fails on RHEL 8.6 worker configuring br-ex interface post reboot

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: 4.12
Component/s: Networking / ovn-kubernetes
Labels:
- NetworkManager
- nmstate

Severity:
Moderate
Regression:
No
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Customer Impact:

Customer Escalated

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:

Description of problem:

Upon node restart NetworkManager fails to activate br-ex interface. Bond has been configured using NMState operator.

Version-Release number of selected component (if applicable):

OCP 4.12 with RHEL 8.6 worker nodes

How reproducible:

Upon node reboot

Steps to Reproduce:

1. Created bond using ens1f0 + ens1f1 = bond0 
2. Reboot the node.
3. Below visible on affected node.

[root@bl9261 ticams003]# ll /etc/sysconfig/network-scripts/*
-rw-r--r--. 1 root root 347 May  8 13:54 /etc/sysconfig/network-scripts/ifcfg-bond0
-rw-r--r--. 1 root root 453 Jun  2 16:03 /etc/sysconfig/network-scripts/ifcfg-bond0.3200
-rw-r--r--. 1 root root 146 May  8 13:54 /etc/sysconfig/network-scripts/ifcfg-ens1f0
-rw-r--r--. 1 root root 146 May  8 13:54 /etc/sysconfig/network-scripts/ifcfg-ens1f1

Actual results:

Below is visible in boot logs 
Aug 25 12:09:44 bl9263 NetworkManager[3388]: <info>  [1692965384.3619] device (br-ex): state change: deactivating -> unmanaged (reason 'removed', sys-iface-state: 'managed')
Aug 25 12:09:44 bl9263 NetworkManager[3388]: <info>  [1692965384.3625] device (bond0.3200): state change: activated -> deactivating (reason 'unmanaged', sys-iface-state: 'managed')
Aug 25 12:09:44 bl9263 NetworkManager[3388]: <info>  [1692965384.3648] device (bond0.3200): state change: deactivating -> unmanaged (reason 'removed', sys-iface-state: 'managed')
Aug 25 12:09:44 bl9263 NetworkManager[3388]: <info>  [1692965384.3652] device (br-ex): state change: activated -> deactivating (reason 'unmanaged', sys-iface-state: 'managed')
Aug 25 12:09:44 bl9263 NetworkManager[3388]: <info>  [1692965384.3663] device (br-ex): state change: deactivating -> unmanaged (reason 'removed', sys-iface-state: 'managed')
Aug 25 12:09:44 bl9263 NetworkManager[3388]: <info>  [1692965384.3663] device (br-ex): detaching ovs interface br-ex
Aug 25 12:09:44 bl9263 NetworkManager[3388]: <info>  [1692965384.3673] device (br-ex): state change: activated -> deactivating (reason 'unmanaged', sys-iface-state: 'managed')

Expected results:

Node networking being normal as br-ex gets activated.

Additional info:

As a workaround restarting NetworkManager helps get the br-ex interface back & node network is back again.

duplicates

OCPBUGS-18716 OVN br-int flows do not get updated on other nodes when a nodes bond MACADDR is changed to other slave interface after reboot.

Closed

Tim Rozet added a comment - 2023/10/31 12:54 PM

Dupe of https://issues.redhat.com/browse/OCPBUGS-18716

Tim Rozet added a comment - 2023/10/31 12:54 PM Dupe of https://issues.redhat.com/browse/OCPBUGS-18716

Tim Rozet added a comment - 2023/10/31 12:54 PM

Confirmed with customer this is a duplicate of: https://issues.redhat.com/browse/OCPBUGS-18716

The bond is changing mac addresses. The fix will be handled as part of 18716.

Tim Rozet added a comment - 2023/10/31 12:54 PM Confirmed with customer this is a duplicate of: https://issues.redhat.com/browse/OCPBUGS-18716 The bond is changing mac addresses. The fix will be handled as part of 18716.

Alex Alecu (Inactive) added a comment - 2023/10/12 3:24 PM - edited

then why i have 3 other clusters that function correctly with the same ifcfg scripts ?

also, bug problem statement is not correct, br-ex gets created every time, ovn-k8s-mp0 does not get created from time to time and thus this issue

Alex Alecu (Inactive) added a comment - 2023/10/12 3:24 PM - edited then why i have 3 other clusters that function correctly with the same ifcfg scripts ? also, bug problem statement is not correct, br-ex gets created every time, ovn-k8s-mp0 does not get created from time to time and thus this issue

Mat Kowalski added a comment - 2023/10/12 3:11 PM - edited

~~The~~ At least one reason for the race you are hitting is using ifcfg scripts and nmconnection files at the same time. This is not going to work. You need to move your ifcfg scripts to nmconnection files as the latter is what kubernetes-nmstate is using.

Once this is done and you are still having problems, we can have a deeper look

Mat Kowalski added a comment - 2023/10/12 3:11 PM - edited The At least one reason for the race you are hitting is using ifcfg scripts and nmconnection files at the same time. This is not going to work. You need to move your ifcfg scripts to nmconnection files as the latter is what kubernetes-nmstate is using. Once this is done and you are still having problems, we can have a deeper look

Alex Alecu (Inactive) added a comment - 2023/10/12 2:56 PM

mkowalsk@redhat.com, indeed our baremetal deployment uses ifcfg scripts. We use the same type of baremetal deployment in our 4 clusters, three of them do not have this issue.

I understand we should transform everything in nmconnection but I'd like to tackle our issue first.
What we noticed that the OVN bridge and ports do not start when the issue manifests (noticed in NetworkManager logs).

Networking is stuck in half way where: br-ex was configured, bond0.3200 was added to the bridge and the IP address was moved from the bond subinterface to the bridge. What we don't see is device ovn-k8s-mp0 being created.

We'll setup traces and post more info after.

Alex Alecu (Inactive) added a comment - 2023/10/12 2:56 PM mkowalsk@redhat.com , indeed our baremetal deployment uses ifcfg scripts. We use the same type of baremetal deployment in our 4 clusters, three of them do not have this issue. I understand we should transform everything in nmconnection but I'd like to tackle our issue first. What we noticed that the OVN bridge and ports do not start when the issue manifests (noticed in NetworkManager logs). Networking is stuck in half way where: br-ex was configured, bond0.3200 was added to the bridge and the IP address was moved from the bond subinterface to the bridge. What we don't see is device ovn-k8s-mp0 being created. We'll setup traces and post more info after.

Mat Kowalski added a comment - 2023/10/11 2:31 PM

First of all, in order for us to provide any more information about this we need to have TRACE logs from NetworkManager (https://access.redhat.com/solutions/7006538).

Secondly, we can see that the nodes are mixing nmconnection files and ifcfg scripts, i.e. I can see `/etc/sysconfig/network-scripts/ifcfg-bond0.3200` that was created manually (or via customer's automation) as well as `/etc/NetworkManager/system-connections/bond0.3244.nmconnection` which are result of NNCP applied. This is not compatible to use both methods at the same time, a solution here is to migrate all `ifcfg-*` scripts to nmconnection files. Having both ways of configuring interfaces at the same time is asking for trouble, this should be fixed first before we try any other method.

Mat Kowalski added a comment - 2023/10/11 2:31 PM First of all, in order for us to provide any more information about this we need to have TRACE logs from NetworkManager ( https://access.redhat.com/solutions/7006538 ). Secondly, we can see that the nodes are mixing nmconnection files and ifcfg scripts, i.e. I can see `/etc/sysconfig/network-scripts/ifcfg-bond0.3200` that was created manually (or via customer's automation) as well as `/etc/NetworkManager/system-connections/bond0.3244.nmconnection` which are result of NNCP applied. This is not compatible to use both methods at the same time, a solution here is to migrate all `ifcfg-*` scripts to nmconnection files. Having both ways of configuring interfaces at the same time is asking for trouble, this should be fixed first before we try any other method.

Kenny Tordeurs added a comment - 2023/10/11 2:19 PM

Is the customization they are performing supported ?

Kenny Tordeurs added a comment - 2023/10/11 2:19 PM Is the customization they are performing supported ?

Kenny Tordeurs added a comment - 2023/10/11 12:17 PM

Additional information that might be important that the customer shared:
3 clusters that don't have this issue and one cluster that has it.
It must be a configuration parameter issue.

Kenny Tordeurs added a comment - 2023/10/11 12:17 PM Additional information that might be important that the customer shared: 3 clusters that don't have this issue and one cluster that has it. It must be a configuration parameter issue.

Assignee:: Tim Rozet

Reporter:: Akash Dubey

QA Contact:: Qiong Wang

Need Info From:: Akash Dubey

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2023/10/10 6:30 PM

Updated:: 2023/10/31 12:55 PM

Resolved:: 2023/10/31 12:54 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: Tim Rozet added a comment - 2023/10/31 12:54 PM

Expand comment: Tim Rozet added a comment - 2023/10/31 12:54 PM

Collapse comment: Tim Rozet added a comment - 2023/10/31 12:54 PM

Expand comment: Tim Rozet added a comment - 2023/10/31 12:54 PM

Collapse comment: Alex Alecu (Inactive) added a comment - 2023/10/12 3:24 PM, Edited by Alex Alecu - 2023/10/12 3:26 PM

Expand comment: Alex Alecu (Inactive) added a comment - 2023/10/12 3:24 PM, Edited by Alex Alecu - 2023/10/12 3:26 PM

Collapse comment: Mat Kowalski added a comment - 2023/10/12 3:11 PM, Edited by Mat Kowalski - 2023/10/12 3:12 PM

Expand comment: Mat Kowalski added a comment - 2023/10/12 3:11 PM, Edited by Mat Kowalski - 2023/10/12 3:12 PM

Collapse comment: Alex Alecu (Inactive) added a comment - 2023/10/12 2:56 PM

Expand comment: Alex Alecu (Inactive) added a comment - 2023/10/12 2:56 PM

Collapse comment: Mat Kowalski added a comment - 2023/10/11 2:31 PM

Expand comment: Mat Kowalski added a comment - 2023/10/11 2:31 PM

Collapse comment: Kenny Tordeurs added a comment - 2023/10/11 2:19 PM

Expand comment: Kenny Tordeurs added a comment - 2023/10/11 2:19 PM

Collapse comment: Kenny Tordeurs added a comment - 2023/10/11 12:17 PM

Expand comment: Kenny Tordeurs added a comment - 2023/10/11 12:17 PM

People

Dates