Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.17
Component/s: Networking / ovn-kubernetes
Labels:
- SDN:OVNK:UserDefinedNetworks:Primary

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
SDN Sprint 265
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

in syncAll networks we seem to be hard failing
E1122 17:27:54.004932 14129 node_network_controller_manager.go:193] Stopping node network controller manager, err=failed to start NAD controller: initial sync failed: failed to sync network test-namespace-36.user-defined-network-36: [node-network-controller-manager network manager]: failed to start network test-namespace-36.user-defined-network-36: failed to add network to node gateway for network test-namespace-36.user-defined-network-36 at node ovn-worker3: error waiting for node readiness: failed to set network test-namespace-36.user-defined-network-36's openflow ports for default bridge; error: failed while waiting on patch port "patch-breth0_test.namespace.36.user.defined.network.36_ovn-worker3-to-br-int" to be created by ovn-controller and while getting ofport. stderr: ovs-vsctl: no row "patch-breth0_test.namespace.36.user.defined.network.36_ovn-worker3-to-br-int" in table Interface
, error: exit status 1
If we end up with an error syncing one of the networks, should we keep restarting ?
ACTION: Let’s open a bug
We agreed that we shouldn’t be crashing on errors that otherwise should self-resolve with retries; controller should be modified a bit to adjust to that scenario

basically TLDR is if the controller has a retry mechanism for something and we think eventually it will reconcile, in such cases we must not return the error hard and fail/restart, we should continue with finishing the startup of the networks and let the individual controller deal with the reconciliationn

links to

upstream PR

Assignee:: Riccardo Ravaioli

Reporter:: Surya Seetharaman

QA Contact:: Anurag Saxena

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2024/12/02 7:52 AM

Updated:: 2025/07/08 4:39 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates