Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-1292

ovn-controller starts configuring ovs even though flow-restore-wait is set to true

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Normal Normal
    • None
    • None
    • OVN, ovn24.03
    • None
    • 8
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide

      Given that flow-restore-wait is set to true,

      When ovn-controller starts,

      Then it must either defer OpenFlow configuration until flow restoration is complete or use a separate controller socket.

      Show
      Given that flow-restore-wait is set to true, When ovn-controller starts, Then it must either defer OpenFlow configuration until flow restoration is complete or use a separate controller socket.
    • rhel-9
    • None
    • rhel-net-ovn
    • ssg_networking
    • OVN FDP Sprint 9
    • 1
    • Important

       Problem Description:

      If we are starting ovn-controller at the same time we start ovs-vswitchd while being in a flow restoration process, we will get the following error:

      ++ ovs-ofctl -O OpenFlow14 add-tlv-map br-int '{class=0x102,type=0x80,len=4}->tun_metadata0'
      2025-04-04T10:27:24Z|00045|connmgr|INFO|br-int<->unix#11: sending NXTTMFC_ALREADY_MAPPED error reply to NXT_TLV_TABLE_MOD message
      OFPT_ERROR (OF1.4) (xid=0x2): NXTTMFC_ALREADY_MAPPED
      NXT_TLV_TABLE_MOD (OF1.4) (xid=0x2):
       ADD mapping table:
        class  type  length  match field
       ------  ----  ------  --------------
        0x102  0x80       4  tun_metadata0

      See more details comment in PR: https://github.com/openstack-k8s-operators/ovn-operator/pull/422#issuecomment-2778314555

       Impact Assessment: Describe the severity and impact (e.g., network down,availability of a workaround, etc.).

      This is needed for the updates of the ovn-controller-ovs pods in RHOSO, since it stops us from being able to restore the flows. We are searching for a workaround (make ovn-controller pod to wait for ovn-controller-ovs pod)

       Software Versions: Specify the exact versions in use (e.g.,openvswitch3.1-3.1.0-147.el8fdp).

      Environment: CRC + openstack operators

      ovn-controller 24.03.6
      Open vSwitch Library 3.3.4
      OpenFlow versions 0x6:0x6
      SB DB Schema 20.33.0

       

        Issue Type: Indicate whether this is a new issue or a regression (if a regression, state the last known working version).

       

       Reproducibility: Confirm if the issue can be reproduced consistently. If not, describe how often it occurs.

      100% in this environment. When a change is introduced in ovn-operator, all pods controlled by ovn-operator will be recreated. Thus, ovn-controller and ovn-controller-ovs start at the same time and the flow restoration in the vswitchd container fails

       Reproduction Steps: Provide detailed steps or scripts to replicate the issue.

      Deploy CRC + operators from openstack

      clone local ovn-operator with this PR included https://github.com/openstack-k8s-operators/ovn-operator/pull/422/commits/f93b2c1f0849196ec9351cc413cc1e28cd9479db

      change ovn-operator to see both pods recreate.

      (I can provide access to this environment)

       Expected Behavior: Describe what should happen under normal circumstances.

      vswitchd doesn't fail and the flow restoration is completed.

       Observed Behavior: Explain what actually happens.

      Instead, the restoration fails, the pod restarts, and all the restoration data is lost.

       Troubleshooting Actions: Outline the steps taken to diagnose or resolve the issue so far.

       

       Logs: If you collected logs please provide them (e.g. sos report, /var/log/openvswitch/* , testpmd console)

      vswitchd logs: https://paste.opendev.org/show/b4uVxSJu37NSuE96tOCb/

      ovn-controller logs: https://paste.opendev.org/show/b3P1bkvUNQl6012pqPyz/

              amusil@redhat.com Ales Musil
              egarciar@redhat.com Elvira Garcia
              Jianlin Shi Jianlin Shi
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: