Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-778

RFE: Add more debug logs for lacp

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None
    • None
    • ovs-dpdk
    • RFE: Add more debug logs for lacp
    • 5
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide

      Please mark each item below with ( / ) if completed or ( x ) if incomplete:

      ( ) The acceptance criteria defined below are met.

      Given an OVS LACP bond is configured, 

      When LACP debug logging is enabled and a link failover occurs, 

      Then, the debug logs should include detailed information about the LACP state machine's behavior and the logs should provide sufficient information to troubleshoot common LACP issues without requiring additional diagnostic steps.


      ( ) The epics work is available in a downstream build (nightly/async or other)


      ( ) Test coverage is available in downstream CI if applicable


      ( ) All cards under the epic have been moved to Done


      ( ) Failed Test Plans have bugs added as children to the epic/feature.

      Show
      Please mark each item below with ( / ) if completed or ( x ) if incomplete: ( ) The acceptance criteria defined below are met. Given an OVS LACP bond is configured,  When LACP debug logging is enabled and a link failover occurs,  Then, the debug logs should include detailed information about the LACP state machine's behavior and the logs should provide sufficient information to troubleshoot common LACP issues without requiring additional diagnostic steps. ( ) The epics work is available in a downstream build (nightly/async or other) ( ) Test coverage is available in downstream CI if applicable ( ) All cards under the epic have been moved to Done ( ) Failed Test Plans have bugs added as children to the epic/feature.
    • rhel-net-ovs-dpdk
    • 100% To Do, 0% In Progress, 0% Done
    • ssg_networking

      This epic tracks all the effort needed to deliver the solution related to the feature request described below.
      Original bugzilla ticket:
      Description of problem:

      while troubleshooting a ovs (kernel, no dpdk) lacp bond issue, i enabled below 2 debuggers.

      [root@computesriov-0 openvswitch]# ovs-appctl vlog/list
      console syslog file
      ------- ------ ------
      bond OFF ERR DBG
      lacp OFF ERR DBG

      I performed link failure at uplink switch.

      When both member interfaces were up.

      2023-07-17T10:45:18.232Z|06252|bond|DBG|bond lacp-bond: enp4s0f0np0 0kB, enp4s0f1np1 0kB
      2023-07-17T10:45:28.241Z|06256|bond|DBG|bond lacp-bond: enp4s0f0np0 0kB, enp4s0f1np1 0kB

      Brought down 1 member interface at uplink switch.

      2023-07-17T10:45:31.672Z|06257|bond|INFO|member enp4s0f0np0: link state down
      2023-07-17T10:45:31.672Z|06258|bond|INFO|member enp4s0f0np0: disabled
      2023-07-17T10:45:31.672Z|06259|bond|INFO|bond lacp-bond: active member is now enp4s0f1np1
      2023-07-17T10:45:31.673Z|08614|bond(revalidator7)|DBG|bond lacp-bond: member enp4s0f0np0: main thread has not yet enabled member
      2023-07-17T10:45:31.679Z|08615|bond(revalidator7)|DBG|bond lacp-bond: member enp4s0f0np0: admissibility verdict is to drop pkt, active member: false, may_enable: false, enabled: false, LACP status: negotiated
      2023-07-17T10:45:38.686Z|06260|bond|DBG|bond lacp-bond: enp4s0f1np1 0kB
      2023-07-17T10:45:48.696Z|06261|bond|DBG|bond lacp-bond: enp4s0f1np1 0kB

      Brought down 2nd member interface.

      2023-07-17T10:45:53.835Z|06262|bond|INFO|member enp4s0f1np1: link state down
      2023-07-17T10:45:53.835Z|06263|bond|INFO|member enp4s0f1np1: disabled
      2023-07-17T10:45:53.835Z|06264|bond|INFO|bond lacp-bond: all members disabled

      Brought up 1st member interface.

      2023-07-17T10:46:28.543Z|06271|bond|INFO|member enp4s0f0np0: link state up
      2023-07-17T10:46:28.543Z|06272|bond|INFO|member enp4s0f0np0: enabled
      2023-07-17T10:46:28.543Z|06273|bond|INFO|bond lacp-bond: active member is now enp4s0f0np0
      2023-07-17T10:46:36.065Z|06274|bond|DBG|bond lacp-bond: enp4s0f0np0 0kB
      2023-07-17T10:46:46.075Z|06275|bond|DBG|bond lacp-bond: enp4s0f0np0 0kB

      Brought up 2nd member interface.

      2023-07-17T10:46:53.055Z|06276|bond|INFO|member enp4s0f1np1: link state up
      2023-07-17T10:46:53.055Z|06277|bond|INFO|member enp4s0f1np1: enabled
      2023-07-17T10:46:56.559Z|06278|bond|DBG|bond lacp-bond: enp4s0f0np0 0kB, enp4s0f1np1 0kB

      LACP re-negotiated successfully.

      [root@computesriov-0 tripleo-admin]# ovs-appctl lacp/show
      ---- lacp-bond ----
      status: active negotiated
      sys_id: 04:3f:72:d9:c0:48
      sys_priority: 65534
      aggregation key: 1
      lacp_time: fast

      member: enp4s0f0np0: current attached
      port_id: 2
      port_priority: 65535
      may_enable: true

      actor sys_id: 04:3f:72:d9:c0:48
      actor sys_priority: 65534
      actor port_id: 2
      actor port_priority: 65535
      actor key: 1
      actor state: activity timeout aggregation synchronized collecting distributing

      partner sys_id: c8:fe:6a:f2:44:00
      partner sys_priority: 127
      partner port_id: 5
      partner port_priority: 127
      partner key: 5
      partner state: activity timeout aggregation synchronized collecting distributing

      member: enp4s0f1np1: current attached
      port_id: 1
      port_priority: 65535
      may_enable: true

      actor sys_id: 04:3f:72:d9:c0:48
      actor sys_priority: 65534
      actor port_id: 1
      actor port_priority: 65535
      actor key: 1
      actor state: activity timeout aggregation synchronized collecting distributing

      partner sys_id: c8:fe:6a:f2:44:00
      partner sys_priority: 127
      partner port_id: 6
      partner port_priority: 127
      partner key: 5
      partner state: activity timeout aggregation synchronized collecting distributing
      [root@computesriov-0 tripleo-admin]#
      [root@computesriov-0 tripleo-admin]#

      [root@computesriov-0 tripleo-admin]# ovs-appctl bond/show
      ---- lacp-bond ----
      bond_mode: balance-slb
      bond may use recirculation: no, Recirc-ID : -1
      bond-hash-basis: 0
      lb_output action: disabled, bond-id: -1
      all members active: false
      updelay: 0 ms
      downdelay: 0 ms
      next rebalance: 9098 ms
      lacp_status: negotiated
      lacp_fallback_ab: true
      active-backup primary: <none>
      active member mac: 04:3f:72:d9:c0:48(enp4s0f0np0)

      member enp4s0f0np0: enabled
      active member
      may_enable: true

      member enp4s0f1np1: enabled
      may_enable: true

      [root@computesriov-0 tripleo-admin]#

      I expect "lacp" debugger should have more debugs enabled to understand what is going with lacp state machine.

      Version-Release number of selected component (if applicable):
      openvswitch3.0-3.0.0-28.el9fdp.x86_64

      How reproducible:
      100%

      Steps to Reproduce:
      1. Configure ovs lacp bond
      2. Perform link fail over
      3.

      Actual results:
      No logs to suggest what is going on with lacp sync

      Expected results:
      should have more logs to help in troubleshoot

      Additional info:
      I have performed this with ovs kernel datapath, however same would be true for ovs-dpdk datapath as well.

              ovsdpdk-bot ovsdpdk bot
              rh-ee-mpattric Mike Pattrick
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: