Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-893

BZ#2244997 Add FDB aging mechanism to OSP18

XMLWordPrintable

    • Add FDB aging mechanism to OSP18
    • 1
    • False
    • Hide

      None

      Show
      None
    • False
    • OSPRH-811Red Hat OpenStack 18.0 Greenfield Deployment
    • Committed
    • Committed
    • Committed
    • To Do
    • OSPRH-811 - Red Hat OpenStack 18.0 Greenfield Deployment
    • openstack-neutron-22.0.3-18.0.20231110074647.60b4c47.el9osttrunk
    • Committed
    • Committed
    • 0% To Do, 50% In Progress, 50% Done
    • Automated

      +++ This bug was initially created as a clone of Bug #2224492 +++

      Description of problem:

      OSP17.1 has a new feature `localnet_learn_fdb`[1] but doesn't have FDB aging mechanism.

      On the other hand, there is a RFE[2] regarding FDB aging mechanism in OVN as FDP.

      One of our telco customers needs to use`localnet_learn_fdb` in OSP17.1 because of the flooding issue which is mentioned in BZ#2173575[1].

      However, they don't know how to resolve the flooding issue because OSP17.1 doesn't have aging mechanism even though there is localnet_learn_fdb feature.

      Would it be possible for us to implement the FDB aging mechanism in OSP17.1?

      [1] https://bugzilla.redhat.com/show_bug.cgi?id=2173575
      [2] https://bugzilla.redhat.com/show_bug.cgi?id=2179942

      — Additional comment from Miguel Lavalle on 2023-07-24 18:38:44 UTC —

      @gurpsing@redhat.com we discussed this RFE during the Neutron squad bug triage. This is a RFE that requires prioritization from product management.

      — Additional comment from Ryo Hayakawa on 2023-07-27 08:04:12 UTC —

      Hello,

      The customer wants to know at least about a workaround, even if we don't have the aging mechanism on OSP17.1.
      For example, is it good way as the workaround to delete FDB entries manually and periodically by using `ovn-sbctl --all destroy fdb`?

      Best regards,
      Ryo Hayakawa

      — Additional comment from Gurpreet Singh on 2023-08-01 08:35:12 UTC —

      Miguel, I had to leave the bug triage meeting early today. Let us sync on the discussion or possible options to address this. As noted in [2] above the FDP aging will be available in 23.09, question is how do we get it in 17.1.z or provide a workaround?

      — Additional comment from Ryo Hayakawa on 2023-08-08 08:53:51 UTC —

      Hello team,

      The customer still wants our official workaround on this issue. I need to explain the progress of this issue in the regular meeting with them.
      So, I would appreciate it if you could let me know the information.

      Best regards,
      Ryo Hayakawa

      — Additional comment from Terry Wilson on 2023-08-09 14:38:17 UTC —

      @amusil@redhat.com Would the workaround of in comment 2 of just periodically deleting the entries in FDB be an acceptable workaround for the customer until they get a release with the FDB aging mechanism?

      — Additional comment from Ales Musil on 2023-08-10 05:32:31 UTC —

      (In reply to Terry Wilson from comment #5)
      > @amusil@redhat.com Would the workaround of in comment 2 of just periodically
      > deleting the entries in FDB be an acceptable workaround for the customer
      > until they get a release with the FDB aging mechanism?

      Hi Terry,

      better solution would be to select which FDB gets deleted. However I understand that might not be always possible.
      So yeah in that case removing everything should work. One thing to note is that after the deletion traffic might be delayed
      a bit so it's better to do it outside of peak hours if possible.

      Regards,
      Ales

      — Additional comment from Gurpreet Singh on 2023-08-10 18:11:27 UTC —

      Hi Ales

      I assume that there is a way to manaually delete the oldest entries. i.e. there is a way to sort or select the oldest ones and automate deleting the oldest ones.

      Regards
      Gurpreet

      — Additional comment from Ryo Hayakawa on 2023-08-14 08:20:44 UTC —

      Hello team,

      Thank you so much for the comments.

      Actually, the customer already tried to run `ovn-sbctl destroy fdb <_uuid>` and `ovn-sbctl --all destroy fdb` in their testing environment, but they are facing the following warning messages:

      ~~~
      [root@ctrxxx ~]# ovn-sbctl destroy fdb xxxxxxxx-ad8a-42b7-accb-c86143774745
      2023-07-04T23:31:19Z|00004|timeval|WARN|Unreasonably long 1449ms poll interval (1240ms user, 180ms system)
      2023-07-04T23:31:19Z|00005|timeval|WARN|faults: 103071 minor, 0 major
      2023-07-04T23:31:19Z|00006|timeval|WARN|context switches: 0 voluntary, 264 involuntary

      [root@ctrxxxxx ~]# ovn-sbctl --all destroy fdb
      2023-07-04T23:36:15Z|00004|timeval|WARN|Unreasonably long 5615ms poll interval (4989ms user, 465ms system)
      2023-07-04T23:36:15Z|00005|timeval|WARN|faults: 247991 minor, 0 major
      2023-07-04T23:36:15Z|00006|timeval|WARN|context switches: 0 voluntary, 876 involuntary
      2023-07-04T23:36:31Z|00027|reconnect|ERR|tcp:10.0.xx.xx:6642: no response to inactivity probe after 5.01 seconds, disconnecting
      ~~~

      However, even though they faced the messages, the FDB entries seemed to be deleted as far as they checked `ovn-sbctl list fdb`.
      The customer is concerned whether we support runnning `ovn-sbctl destroy fdb` due to the warning. Do we support it?

      Best regards,
      Ryo Hayakawa

      — Additional comment from Daniel Alvarez Sanchez on 2023-08-14 08:33:06 UTC —

      (In reply to Gurpreet Singh from comment #7)
      > Hi Ales
      >
      > I assume that there is a way to manaually delete the oldest entries. i.e.
      > there is a way to sort or select the oldest ones and automate deleting the
      > oldest ones.
      >
      > Regards
      > Gurpreet

      Oldest entries are not necessarily the least disruptive to be deleted as they can be the most used.
      Ideally, LRU entries should be deleted first.
      However, since this is not easy, the oldest get deleted first in the core OVN implementation:

      https://patchwork.ozlabs.org/project/ovn/patch/20230518113248.71715-6-amusil@redhat.com/

      Doing this manually would be possible as the FDB entries have a timestamp:
      https://github.com/ovn-org/ovn/blob/24da428ead48efda19b81381980b031e6b1d1eb0/ovn-sb.ovsschema#L580

      I believe we should enable this feature soon in OSP 17.1 to avoid customers going through the manual and risky (and unsupported) process of dealing with the OVN database directly.
      Until then, I believe that we should not allow setting localnet_learn_fdb to 'true'.

      — Additional comment from Daniel Alvarez Sanchez on 2023-08-14 08:46:20 UTC —

      @Terry, once you enable the aging mechanism you probably have to add the THT support for this as i can only see it in puppet:

      https://opendev.org/openstack/puppet-neutron/commit/e2b6b6aeea8dff88ae879eedd4adae32bb484ce4

      — Additional comment from Gurpreet Singh on 2023-08-14 21:29:59 UTC —

      Asper Ihar's comment on slack channel, the work around if mainly deleting the entries manually using ovn-nbctl. This can be automated, but there are not timestamps associated with the entries, so the entries will be deleted without considering the current age of the entries and will results in intermittent down time.

      — Additional comment from Gurpreet Singh on 2023-09-11 23:09:27 UTC —

      Luis,what is the earliest target for 17.1.z we can get this addressed considering the aging mechanism support will be in FDB 23.09 as indicated earlier?

      — Additional comment from Luis Tomas Bolivar on 2023-09-12 06:39:03 UTC —

      (In reply to Gurpreet Singh from comment #12)
      > Luis,what is the earliest target for 17.1.z we can get this addressed
      > considering the aging mechanism support will be in FDB 23.09 as indicated
      > earlier?

      I'm working on the patch on the neutron side, I hope to have it ready soon. But I'm not sure when 17.1 is going to move to 23.09. As far as I know, there is no date for 17.1.2/17.1.3 either, so hard to say at the moment

              ltomasbo@redhat.com Luis Tomas Bolivar
              jira-bugzilla-migration RH Bugzilla Integration
              Rodolfo Alonso Rodolfo Alonso
              rhos-dfg-networking-squad-neutron
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: