Uploaded image for project: 'OpenShift Core Networking'
  1. OpenShift Core Networking
  2. CORENET-5350

[4.20] GA OVN Kubernetes support for BGP as a routing protocol: On-Prem

XMLWordPrintable

    • [4.19] OVN Kubernetes support for BGP as a routing protocol
    • Product / Portfolio Work
    • OCPSTRAT-1361BGP for UDN GA [On-prem]
    • 0% To Do, 0% In Progress, 100% Done
    • False
    • Hide

      None

      Show
      None
    • False
    • Red
    • Hide

      [Jaime - based on BGP sync on 9th of June]

      We have decided to track further efforts described in the previous update with epic https://issues.redhat.com/browse/CORENET-6022.
      We will be closing this epic shortly.

      [Jaime - based on Team Metting on 5th of June]

      • We have a green dashboard to lift feature gate for 4.20
      • We decided it is OK to lift feature gate in 4.20 as long as the following limitations are solved before lifting the feature gate in 4.19 and before 4.20 becomes GA:
        • Support BGP for L2 network in LGW mode
        • Support the following CVS requirements:
          • Implement a configuration knob to disable UDN isolation
          • Ensure inter-UDN traffic egresses the cluster

      [Jaime - based on BGP sync on May 29th]

      • No outstanding known bugs blocking feature GA
      • All dowsntream tests in and running in our jobs
      • However, we are looking into CI flakes impacting the pass rate that is evaluated when considering the feature to be GA quality
      • We will batch some jobs today to get our pass rate up
      • Although there is non-nil chance that batch gets our pass-rate into the green, it is likely we will roll over into sprint 272.
      • That issue where a cluster node loses connectivity has low impact in our CI pass rates but high severity so needs to be looked at.

      [Jaime - based on BGP sync on May 8th]

      • Items tracked for GA:
        • Downstream EIP, UDN L2 and VRF-Lite tests: blocked due to metal jobs permafailing since branching, otherwise ready to go
        • OCPBUGS-52462: fix ready and waiting on OVN hotfix, but already tested and verified by QE
        • OCPBUGS-55926: verification of previous bug found out a different issue we consider blocker and needs to be looked into
        • OCPBUGS-55962: provide a way to turn off UDN isolation

      [Surya - based on BGP sync on April 28th]

      [Jaime - based on BGP sync on April 24th]
      We continue to deal with the following feature GA blockers

      • Three isolation bugs, two of which are plan to be fixed shortly, and another one depends on an FDP issue.
      • Downstream tests are progressing, but slowly due to infra issues and metal resource constraints.
      • Three new blocker issues registered in the last week.
        ========

      [surya - based on BGP sync on April 14th] The reason why this EPIC is red status is because:

      1. Two of the Isolation bugs are at risk in being completed on time : https://issues.redhat.com/browse/OCPBUGS-52462 and https://issues.redhat.com/browse/OCPBUGS-52278 
      2. Downstream tests are at risk in being completed on time https://issues.redhat.com/browse/CORENET-5875 and https://issues.redhat.com/browse/CORENET-5876 and https://issues.redhat.com/browse/CORENET-5854 - we have merged some tests - they are showing up sippy - we are digging into the flakes. 

      ========

      [surya - based on BGP sync on April 10th] The reason why this EPIC is red status is because:

      1. Two of the Isolation bugs are at risk in being completed on time : https://issues.redhat.com/browse/OCPBUGS-52462 and https://issues.redhat.com/browse/OCPBUGS-52278 and routes disappearing bug https://issues.redhat.com/browse/OCPBUGS-52194 
      2. Downstream tests are at risk in being completed on time https://issues.redhat.com/browse/CORENET-5875 and https://issues.redhat.com/browse/CORENET-5876 and https://issues.redhat.com/browse/CORENET-5854 - we don't have any tests merged into origin as of 10th April 

      Other caveats to remember:

      1. LGW and L2 Advertised UDNs will not be done: known limitation will be documented
      2. ETP=local, NodePorts are funky and won't work with BGP advertised networks: known limitation will be documented.
      3. Remaining question mark is QE: If FG lifting is moved to 25th April is that enough time for QE to do their bits? We won't be removing FG by 18th for sure
      4. Ask PMs in today's PM Sync call about the CNV BGP Testing Epic -> if that should block our FG or can we treat CNV any potential issues as just bugs?
      Show
      [Jaime - based on BGP sync on 9th of June] We have decided to track further efforts described in the previous update with epic https://issues.redhat.com/browse/CORENET-6022 . We will be closing this epic shortly. [Jaime - based on Team Metting on 5th of June] We have a green dashboard to lift feature gate for 4.20 We decided it is OK to lift feature gate in 4.20 as long as the following limitations are solved before lifting the feature gate in 4.19 and before 4.20 becomes GA: Support BGP for L2 network in LGW mode Support the following CVS requirements: Implement a configuration knob to disable UDN isolation Ensure inter-UDN traffic egresses the cluster [Jaime - based on BGP sync on May 29th] No outstanding known bugs blocking feature GA All dowsntream tests in and running in our jobs However, we are looking into CI flakes impacting the pass rate that is evaluated when considering the feature to be GA quality [High Impact] One unrelated knmstate bug was worked around - https://github.com/openshift/origin/pull/29853 [Low Impact] One OVN-K crash was fixed - https://github.com/openshift/ovn-kubernetes/pull/2588 [Low Impact] A cluster node loses connectivity, unknown relationship with feature, under investigation We will batch some jobs today to get our pass rate up Although there is non-nil chance that batch gets our pass-rate into the green, it is likely we will roll over into sprint 272. That issue where a cluster node loses connectivity has low impact in our CI pass rates but high severity so needs to be looked at. [Jaime - based on BGP sync on May 8th] Items tracked for GA: Downstream EIP, UDN L2 and VRF-Lite tests: blocked due to metal jobs permafailing since branching, otherwise ready to go OCPBUGS-52462 : fix ready and waiting on OVN hotfix, but already tested and verified by QE OCPBUGS-55926 : verification of previous bug found out a different issue we consider blocker and needs to be looked into OCPBUGS-55962: provide a way to turn off UDN isolation [Surya - based on BGP sync on April 28th] 2 Isolation bugs will be merged this week (merged on 29th april), 3rd one depends on OVN fix/bump: https://issues.redhat.com/browse/FDP-1321   Downstream tests: https://github.com/openshift/origin/pull/29617 - EIP tests and helper changes https://github.com/openshift/origin/pull/29727 - L2 UDN tests Jaime's VRFLite PRs 4.19.0 has slipped, working towards 4.20 and 4.19.z lifting within 271 sprint [Jaime - based on BGP sync on April 24th] We continue to deal with the following feature GA blockers Three isolation bugs, two of which are plan to be fixed shortly, and another one depends on an FDP issue. Downstream tests are progressing, but slowly due to infra issues and metal resource constraints. Three new blocker issues registered in the last week. ======== [surya - based on BGP sync on April 14th] The reason why this EPIC is red status is because: Two of the Isolation bugs are at risk in being completed on time : https://issues.redhat.com/browse/OCPBUGS-52462 and https://issues.redhat.com/browse/OCPBUGS-52278   Downstream tests are at risk in being completed on time https://issues.redhat.com/browse/CORENET-5875 and https://issues.redhat.com/browse/CORENET-5876 and https://issues.redhat.com/browse/CORENET-5854 - we have merged some tests - they are showing up sippy - we are digging into the flakes.  ======== [surya - based on BGP sync on April 10th] The reason why this EPIC is red status is because: Two of the Isolation bugs are at risk in being completed on time : https://issues.redhat.com/browse/OCPBUGS-52462 and https://issues.redhat.com/browse/OCPBUGS-52278 and routes disappearing bug https://issues.redhat.com/browse/OCPBUGS-52194   Downstream tests are at risk in being completed on time https://issues.redhat.com/browse/CORENET-5875 and https://issues.redhat.com/browse/CORENET-5876 and https://issues.redhat.com/browse/CORENET-5854 - we don't have any tests merged into origin as of 10th April  Other caveats to remember: LGW and L2 Advertised UDNs will not be done: known limitation will be documented ETP=local, NodePorts are funky and won't work with BGP advertised networks: known limitation will be documented. Remaining question mark is QE: If FG lifting is moved to 25th April is that enough time for QE to do their bits? We won't be removing FG by 18th for sure Ask PMs in today's PM Sync call about the CNV BGP Testing Epic -> if that should block our FG or can we treat CNV any potential issues as just bugs?
    • None
    • 0

      Epic Goal

      • Left over from 4.18 (potentially BGP+UDN, egress IP)
      • perf/scale
      • UX fixes (ovnk specific API)
      • Enabling subset of nodes selected for BGP advertisement with pod network (requirement from customers)

      Why is this important?

      Planning Done Checklist

      The following items must be completed on the Epic prior to moving the Epic from Planning to the ToDo status

      • Priority+ is set by engineering
      • Epic must be Linked to a +Parent Feature
      • Target version+ must be set
      • Assignee+ must be set
      • (Enhancement Proposal is Implementable
      • (No outstanding questions about major work breakdown
      • (Are all Stakeholders known? Have they all been notified about this item?
      • Does this epic affect SD? {}Have they been notified{+}? (View plan definition for current suggested assignee)
        1. Please use the “Discussion Needed: Service Delivery Architecture Overview” checkbox to facilitate the conversation with SD Architects. The SD architecture team monitors this checkbox which should then spur the conversation between SD and epic stakeholders. Once the conversation has occurred, uncheck the “Discussion Needed: Service Delivery Architecture Overview” checkbox and record the outcome of the discussion in the epic description here.
        2. The guidance here is that unless it is very clear that your epic doesn’t have any managed services impact, default to use the Discussion Needed checkbox to facilitate that conversation.

      Additional information on each of the above items can be found here: Networking Definition of Planned

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement
        details and documents.

      ...

      Dependencies (internal and external)

      1.

      ...

      Previous Work (Optional):

      1. …

      Open questions::

      1. …

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

              jcaamano@redhat.com Jaime Caamaño Ruiz
              ddharwar@redhat.com Deepthi Dharwar (Inactive)
              None
              Arnab Ghosh, Martin Kennelly, Meina Li, Patryk Diak, Peng Liu, Surya Seetharaman, Ying Wang, Zhanqi Zhao
              Jean Chen Jean Chen
              Jason Boxman Jason Boxman
              Votes:
              1 Vote for this issue
              Watchers:
              13 Start watching this issue

                Created:
                Updated:
                Resolved: