Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-1878

OVN dynamic-routing - ovn-controller continuously tries and fails to create invalid VRFs

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • OVN
    • None
    • OVN dynamic-routing - ovn-controller continuously tries and fails to create invalid VRFs
    • 3
    • False
    • False
    • Hide

      Please mark each item below with ( / ) if completed or ( x ) if incomplete:

      ( ) The acceptance criteria defined below are met.

      Given OVN dynamic routing is configured with an invalid VRF table ID,
      When ovn-controller attempts to create the VRF,
      Then it logs a warning about the invalid table ID and stops retrying. Also, CPU usage returns to normal levels without continuous retry loops.


      ( ) The epics work is available in a downstream build (nightly/Async or other)


      ( ) All cards under the epic have been moved to Done

      Show
      Please mark each item below with ( / ) if completed or ( x ) if incomplete: ( ) The acceptance criteria defined below are met. Given OVN dynamic routing is configured with an invalid VRF table ID, When ovn-controller attempts to create the VRF, Then it logs a warning about the invalid table ID and stops retrying. Also, CPU usage returns to normal levels without continuous retry loops. ( ) The epics work is available in a downstream build (nightly/Async or other) ( ) All cards under the epic have been moved to Done
    • In Progress
    • rhel-9
    • rhel-net-ovn
    • 0% To Do, 0% In Progress, 100% Done
    • ssg_networking

      This epic tracks all the effort needed to deliver the solution related to the bug described below.

       Problem Description: Clearly explain the issue.

      When incorrectly configured to use an invalid VRF ID (https://github.com/ovn-org/ovn/blob/c73f7912665d6a6e935c7c8aeae8389c059e9ddb/controller/route-exchange-netlink.c#L41) ovn-controller tries to create the VRF and fails.

      It then continuously retries, wasting CPU on operations that would fail anyway.

       Impact Assessment: Describe the severity and impact (e.g., network down,availability of a workaround, etc.).

      Not that severe as it happens when invalid VRF IDs are used.

       Software Versions: Specify the exact versions in use (e.g.,openvswitch3.1-3.1.0-147.el8fdp).

      ovn25.03, including ovn25.03-25.03.1-86.el9fdp

        Issue Type: Indicate whether this is a new issue or a regression (if a regression, state the last known working version).

      Day one issue

       Reproducibility: Confirm if the issue can be reproduced consistently. If not, describe how often it occurs.

      Yes.

       Reproduction Steps: Provide detailed steps or scripts to replicate the issue.

      Use the following ovn configuration and check logs on chassis-1:

      ovn-nbctl lr-add lr                                          \
        -- set logical_router lr options:dynamic-routing=true      \
                                 options:chassis=chassis-1         \
                                 options:requested-tnl-key=253     \
        -- lrp-add lr lrp 00:00:00:00:00:01 1.1.1.1/24             \
          -- lrp-set-options lrp dynamic-routing-maintain-vrf=true \
        -- ls-add ls                                               \
          -- lsp-add ls lsp                                        \
          -- lsp-set-type lsp router                               \
          -- lsp-set-addresses lsp router                          \
          -- lsp-set-options lsp router-port=lrp
       

       Expected Behavior: Describe what should happen under normal circumstances.

      ovn-controller should log the warning that the VRF ID 253 is not valid but shouldn't keep retrying to configure the invalid value.

       Observed Behavior: Explain what actually happens.

      ovn-controller loops and retries the invalid configuration continuously, hogging CPU:

      2025-10-20T10:39:16.565Z|00077|route_exchange|WARN|Unable to create VRF ovnvrf253 for datapath 253: Invalid argument.
      2025-10-20T10:39:16.566Z|00078|route_exchange_netlink|WARN|attempt to create VRF using invalid table id 253
      2025-10-20T10:39:16.566Z|00079|route_exchange|WARN|Unable to create VRF ovnvrf253 for datapath 253: Invalid argument.
      2025-10-20T10:39:16.566Z|00080|route_exchange_netlink|WARN|attempt to create VRF using invalid table id 253
      2025-10-20T10:39:16.566Z|00081|route_exchange|WARN|Unable to create VRF ovnvrf253 for datapath 253: Invalid argument.
      2025-10-20T10:39:20.623Z|00082|poll_loop|INFO|wakeup due to 0-ms timeout at controller/route-exchange.c:256 (98% CPU usage) 

       Troubleshooting Actions: Outline the steps taken to diagnose or resolve the issue so far.

       

       Logs: If you collected logs please provide them (e.g. sos report, /var/log/openvswitch/* , testpmd console)

              ovnteam@redhat.com OVN Team
              dceara@redhat.com Dumitru Ceara
              OVN QE OVN QE
              OVN
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: