-
Epic
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
Adopt active heartbeating timeout mechanism for neutron RPC calls using oslo.messaging
-
False
-
-
False
-
Committed
-
No Docs Impact
-
To Do
-
Committed
-
Proposed
-
100% To Do, 0% In Progress, 0% Done
-
-
-
Networking; Neutron
This Epic is to track upstream effort to adopt a more robust RPC timeout monitoring mechanism implemented in oslo.messaging when using call_monitor_timeout option of the library to create RPC clients.
Currently, Neutron RPC client implements its own back-off mechanism to handle timeout, which first fails long calls, then repeats them with a higher timeout, and proceeds to do so (up to a limit). The suggestion here is to instead allow oslo.messaging to run active heartbeating / probing of the RPC channel and NOT fail long operations when they take a longer time BUT are not due to a death of neutron call handler.
The implementation promises improvement in loaded cluster behavior when communicating to AMQP agents (neutron-dhcp, neutron-sriov.) Specifically, longer operations in a cluster under load should fail less frequently.
- links to