-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.17
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
SDN Sprint 265
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
- in syncAll networks we seem to be hard failing
- E1122 17:27:54.004932 14129 node_network_controller_manager.go:193] Stopping node network controller manager, err=failed to start NAD controller: initial sync failed: failed to sync network test-namespace-36.user-defined-network-36: [node-network-controller-manager network manager]: failed to start network test-namespace-36.user-defined-network-36: failed to add network to node gateway for network test-namespace-36.user-defined-network-36 at node ovn-worker3: error waiting for node readiness: failed to set network test-namespace-36.user-defined-network-36's openflow ports for default bridge; error: failed while waiting on patch port "patch-breth0_test.namespace.36.user.defined.network.36_ovn-worker3-to-br-int" to be created by ovn-controller and while getting ofport. stderr: ovs-vsctl: no row "patch-breth0_test.namespace.36.user.defined.network.36_ovn-worker3-to-br-int" in table Interface
- , error: exit status 1
- If we end up with an error syncing one of the networks, should we keep restarting ?
- ACTION: Let’s open a bug
- We agreed that we shouldn’t be crashing on errors that otherwise should self-resolve with retries; controller should be modified a bit to adjust to that scenario
basically TLDR is if the controller has a retry mechanism for something and we think eventually it will reconcile, in such cases we must not return the error hard and fail/restart, we should continue with finishing the startup of the networks and let the individual controller deal with the reconciliationn
- links to