This epic tracks all the effort needed to deliver the solution related to the feature request described below.
Original bugzilla ticket:
Description of problem:
while troubleshooting a ovs (kernel, no dpdk) lacp bond issue, i enabled below 2 debuggers.
[root@computesriov-0 openvswitch]# ovs-appctl vlog/list
console syslog file
------- ------ ------
bond OFF ERR DBG
lacp OFF ERR DBG
I performed link failure at uplink switch.
When both member interfaces were up.
2023-07-17T10:45:18.232Z|06252|bond|DBG|bond lacp-bond: enp4s0f0np0 0kB, enp4s0f1np1 0kB
2023-07-17T10:45:28.241Z|06256|bond|DBG|bond lacp-bond: enp4s0f0np0 0kB, enp4s0f1np1 0kB
Brought down 1 member interface at uplink switch.
2023-07-17T10:45:31.672Z|06257|bond|INFO|member enp4s0f0np0: link state down
2023-07-17T10:45:31.672Z|06258|bond|INFO|member enp4s0f0np0: disabled
2023-07-17T10:45:31.672Z|06259|bond|INFO|bond lacp-bond: active member is now enp4s0f1np1
2023-07-17T10:45:31.673Z|08614|bond(revalidator7)|DBG|bond lacp-bond: member enp4s0f0np0: main thread has not yet enabled member
2023-07-17T10:45:31.679Z|08615|bond(revalidator7)|DBG|bond lacp-bond: member enp4s0f0np0: admissibility verdict is to drop pkt, active member: false, may_enable: false, enabled: false, LACP status: negotiated
2023-07-17T10:45:38.686Z|06260|bond|DBG|bond lacp-bond: enp4s0f1np1 0kB
2023-07-17T10:45:48.696Z|06261|bond|DBG|bond lacp-bond: enp4s0f1np1 0kB
Brought down 2nd member interface.
2023-07-17T10:45:53.835Z|06262|bond|INFO|member enp4s0f1np1: link state down
2023-07-17T10:45:53.835Z|06263|bond|INFO|member enp4s0f1np1: disabled
2023-07-17T10:45:53.835Z|06264|bond|INFO|bond lacp-bond: all members disabled
Brought up 1st member interface.
2023-07-17T10:46:28.543Z|06271|bond|INFO|member enp4s0f0np0: link state up
2023-07-17T10:46:28.543Z|06272|bond|INFO|member enp4s0f0np0: enabled
2023-07-17T10:46:28.543Z|06273|bond|INFO|bond lacp-bond: active member is now enp4s0f0np0
2023-07-17T10:46:36.065Z|06274|bond|DBG|bond lacp-bond: enp4s0f0np0 0kB
2023-07-17T10:46:46.075Z|06275|bond|DBG|bond lacp-bond: enp4s0f0np0 0kB
Brought up 2nd member interface.
2023-07-17T10:46:53.055Z|06276|bond|INFO|member enp4s0f1np1: link state up
2023-07-17T10:46:53.055Z|06277|bond|INFO|member enp4s0f1np1: enabled
2023-07-17T10:46:56.559Z|06278|bond|DBG|bond lacp-bond: enp4s0f0np0 0kB, enp4s0f1np1 0kB
LACP re-negotiated successfully.
[root@computesriov-0 tripleo-admin]# ovs-appctl lacp/show
---- lacp-bond ----
status: active negotiated
sys_id: 04:3f:72:d9:c0:48
sys_priority: 65534
aggregation key: 1
lacp_time: fast
member: enp4s0f0np0: current attached
port_id: 2
port_priority: 65535
may_enable: true
actor sys_id: 04:3f:72:d9:c0:48
actor sys_priority: 65534
actor port_id: 2
actor port_priority: 65535
actor key: 1
actor state: activity timeout aggregation synchronized collecting distributing
partner sys_id: c8:fe:6a:f2:44:00
partner sys_priority: 127
partner port_id: 5
partner port_priority: 127
partner key: 5
partner state: activity timeout aggregation synchronized collecting distributing
member: enp4s0f1np1: current attached
port_id: 1
port_priority: 65535
may_enable: true
actor sys_id: 04:3f:72:d9:c0:48
actor sys_priority: 65534
actor port_id: 1
actor port_priority: 65535
actor key: 1
actor state: activity timeout aggregation synchronized collecting distributing
partner sys_id: c8:fe:6a:f2:44:00
partner sys_priority: 127
partner port_id: 6
partner port_priority: 127
partner key: 5
partner state: activity timeout aggregation synchronized collecting distributing
[root@computesriov-0 tripleo-admin]#
[root@computesriov-0 tripleo-admin]#
[root@computesriov-0 tripleo-admin]# ovs-appctl bond/show
---- lacp-bond ----
bond_mode: balance-slb
bond may use recirculation: no, Recirc-ID : -1
bond-hash-basis: 0
lb_output action: disabled, bond-id: -1
all members active: false
updelay: 0 ms
downdelay: 0 ms
next rebalance: 9098 ms
lacp_status: negotiated
lacp_fallback_ab: true
active-backup primary: <none>
active member mac: 04:3f:72:d9:c0:48(enp4s0f0np0)
member enp4s0f0np0: enabled
active member
may_enable: true
member enp4s0f1np1: enabled
may_enable: true
[root@computesriov-0 tripleo-admin]#
I expect "lacp" debugger should have more debugs enabled to understand what is going with lacp state machine.
Version-Release number of selected component (if applicable):
openvswitch3.0-3.0.0-28.el9fdp.x86_64
How reproducible:
100%
Steps to Reproduce:
1. Configure ovs lacp bond
2. Perform link fail over
3.
Actual results:
No logs to suggest what is going on with lacp sync
Expected results:
should have more logs to help in troubleshoot
Additional info:
I have performed this with ovs kernel datapath, however same would be true for ovs-dpdk datapath as well.