-
Bug
-
Resolution: Can't Do
-
Major
-
None
-
None
-
None
-
False
-
-
False
-
-
rhel-9
-
None
-
rhel-net-ovs-dpdk
-
-
-
ssg_networking
Problem Description: Clearly explain the issue.
From time to time RAFT transfers leadership to write snapshots:
2025-04-17T03:51:58.983Z|00046|raft|INFO|Transferring leadership to write a snapshot.
The problem is that this transfer seem to cause interruptions on Neutron side explained in https://issues.redhat.com/browse/OSPRH-14377 and https://issues.redhat.com/browse/OSPRH-16149
In https://issues.redhat.com/browse/OSPRH-16149 a Neutron port was created during leadership transfer, then Neutron failed to bind port because on OVN side it didn't exist.
Impact Assessment: Describe the severity and impact (e.g., network down,availability of a workaround, etc.).
Some of instances startups in RHOSP environment fail. This may complicate automation on customer's side and introduce a requirement to implement some cleanup + retry logic.
Software Versions: Specify the exact versions in use (e.g.,openvswitch3.1-3.1.0-147.el8fdp).
openvswitch3.3-3.3.0-49.el9fdp.x86_64
Issue Type: Indicate whether this is a new issue or a regression (if a regression, state the last known working version).
Likely new one
Reproducibility: Confirm if the issue can be reproduced consistently. If not, describe how often it occurs.
It happens occasionally in customer's deployment when batches of instances are started simultaneously with changes on OVN cluster side.
Reproduction Steps: Provide detailed steps or scripts to replicate the issue.
Irrelevant
Expected Behavior: Describe what should happen under normal circumstances.
Neutron operations shouldn't be interrupted by OVN issues, OVN should provide consistent communications with its control plane.
Observed Behavior: Explain what actually happens.
It looks like that leadership transfer causes communication timeouts.
Troubleshooting Actions: Outline the steps taken to diagnose or resolve the issue so far.
Compare Neutron error logs from https://issues.redhat.com/browse/OSPRH-16149 with OVN events
Logs: If you collected logs please provide them (e.g. sos report, /var/log/openvswitch/* , testpmd console)
latest set of sosreport contain relevant messages in /var/log/containers/openvswitch/ovsdb-server-nb.log and /var/log/containers/openvswitch/ovsdb-server-sb.log. Issue happened on 2025-04-17 at 04:52 local time/ 03:52 UTC
- relates to
-
FDP-1392 OVN routers are failing over and causing downtime
-
- Closed
-