• cudn-localnet-4.19
    • Strategic Portfolio Work
    • False
    • None
    • False
    • Green
    • On Track
    • In Progress
    • OCPSTRAT-1847 - Post GA UDN Improvements
    • 17% To Do, 33% In Progress, 50% Done
    • ---
    • 0

      Template:

      Networking Definition of Planned

      Epic Template descriptions and documentation

      Epic Goal

      Provide quality user experience for customers connecting their Pods and VMs to the underlying physical network through OVN Kubernetes localnet.

      Why is this important?

      This is a continuation to https://issues.redhat.com/browse/SDN-5313.

      It covers the UDN API for localnet and other improvements

      Planning Done Checklist

      The following items must be completed on the Epic prior to moving the Epic from Planning to the ToDo status

      • Priority+ is set by engineering
      • Epic must be Linked to a +Parent Feature
      • Target version+ must be set
      • Assignee+ must be set
      • (Enhancement Proposal is Implementable
      • (No outstanding questions about major work breakdown
      • (Are all Stakeholders known? Have they all been notified about this item?
      • Does this epic affect SD? {}Have they been notified{+}? (View plan definition for current suggested assignee)
        1. Please use the โ€œDiscussion Needed: Service Delivery Architecture Overviewโ€ checkbox to facilitate the conversation with SD Architects. The SD architecture team monitors this checkbox which should then spur the conversation between SD and epic stakeholders. Once the conversation has occurred, uncheck the โ€œDiscussion Needed: Service Delivery Architecture Overviewโ€ checkbox and record the outcome of the discussion in the epic description here.
        2. The guidance here is that unless it is very clear that your epic doesnโ€™t have any managed services impact, default to use the Discussion Needed checkbox to facilitate that conversation.

      Additional information on each of the above items can be found here: Networking Definition of Planned

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
        • This must be done downstream too
      • Release Technical Enablement - Provide necessary release enablement
        details and documents.
      • OVN Kubernetes secondary networks with the localnet topology can be created through ClusterUserDefinedNetworks
      • When possible, user input is validated and any configuration issue is shown on the UDN. Alternatively some issues can be shown on CNI ADD events on Pod
      • Definition of these networks can be changed even if there are Pods connected to them. When that happens, the UDN is marked as degraded until all the "old" pods are gone. The mutable fields should be: MTU, VLAN, physnet name For cases where a user incorrectly set their MTU, VLAN, or physnet name, there is a clear and foolproof flow describing how to correct this mistake.
      • A single "bridge-mappings" "localnet" can be referenced from multiple different UDNs
      • The default MTU set for localnet is 1500
      • Pod requesting UDN without a VLAN is able to connect to services running on the host's network
      • (stretch) The "physnet" mapping is a "supported API" and available to users - so they can connect to the machine network without a need to configure a custom bridge-mapping we should just always request user to configure the mapping themselves, until we understand all the implications of non-NORMAL mode on br-ex and how it works with local access / bondings / ...
      • (stretch) Scheduling is managed by the platform - if a UDN requests a localnet (as in bridge-mappins.localnet), the Pod requesting this UDN will be only scheduled on a node with this resource available. This can use the same mechanism as the SR-IOV operator - combination of device plugins and "k8s.v1.cni.cncf.io/resourceName" annotation

      ...

      IPAM is not in the scope of this epic. See RFE-6947.

      Dependencies (internal and external)

      1.

      ...

      Previous Work (Optional):

      1. โ€ฆ

      Open questions::

      1. โ€ฆ

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

       
       
       

       

            [CORENET-5358] Universal connectivity: Localnet [4.19]

            Riccardo Ravaioli added a comment - - edited

            ahardin@redhat.com I think most of the documentation effort will be for https://issues.redhat.com/browse/CORENET-5820  . There's also an existing card for a separate documentation task: https://issues.redhat.com/browse/CORENET-5477 . Thanks!

            Riccardo Ravaioli added a comment - - edited ahardin@redhat.com I think most of the documentation effort will be for https://issues.redhat.com/browse/CORENET-5820   . There's also an existing card for a separate documentation task: https://issues.redhat.com/browse/CORENET-5477 . Thanks!

            rhn-support-asood makes sense, thanks, much appreciated!

            Petr Horacek added a comment - rhn-support-asood makes sense, thanks, much appreciated!

            rhn-support-asood hi! I don't know what is your current test plan, but could we add a check for UDN immutability? To reflect https://github.com/ovn-kubernetes/ovn-kubernetes/pull/5087 Having the config immutable is critical to improve UX issues raised in https://issues.redhat.com/browse/PLMCORE-10896

            Petr Horacek added a comment - rhn-support-asood hi! I don't know what is your current test plan, but could we add a check for UDN immutability? To reflect https://github.com/ovn-kubernetes/ovn-kubernetes/pull/5087 Having the config immutable is critical to improve UX issues raised in https://issues.redhat.com/browse/PLMCORE-10896

            Or Mergi added a comment - - edited

            Regarding the the criteria: 

            • Definition of these networks can be changed even if there are Pods connected to them. When that happens, the UDN is marked as degraded until all the "old" pods are gone. The mutable fields should be: MTU, VLAN, physnet name

            Following offline discussion with phoracek@redhat.com about the "the UDN is marked as degraded until all the "old" pods are gone" part,
            the expectation is the CR to have a condition indicating a degraded state, when at least one pod require restart following spec mutation.

            Or Mergi added a comment - - edited Regarding the the criteria:  Definition of these networks can be changed even if there are Pods connected to them. When that happens, the UDN is marked as degraded until all the "old" pods are gone . The mutable fields should be: MTU, VLAN, physnet name Following offline discussion with phoracek@redhat.com about the "the UDN is marked as degraded until all the "old" pods are gone" part, the expectation is the CR to have a condition indicating a degraded state, when at least one pod require restart following spec mutation.

            Or Mergi added a comment - - edited

            Summary of offline discussion with mduarted@redhat.com about the criteria:

            • Definition of these networks can be changed even if there are Pods connected to them. When that happens, the UDN is marked as degraded until all the "old" pods are gone. The mutable fields should be: MTU, VLAN, physnet name

            What we understood is that when the CR spec is mutated, the CR should have condition reflecting which pod still runs with previous network settings requiring a restart, so it will pick the latest network settings.

            Addressing this would require initial feasibility check (in context of CUDN controller), grooming and probability its own design.

            phoracek@redhat.com could you please elaborate about this criteria?

            Or Mergi added a comment - - edited Summary of offline discussion with mduarted@redhat.com about the criteria: Definition of these networks can be changed even if there are Pods connected to them. When that happens, the UDN is marked as degraded until all the "old" pods are gone. The mutable fields should be: MTU, VLAN, physnet name What we understood is that when the CR spec is mutated, the CR should have condition reflecting which pod still runs with previous network settings requiring a restart, so it will pick the latest network settings. Addressing this would require initial feasibility check (in context of CUDN controller), grooming and probability its own design. phoracek@redhat.com could you please elaborate about this criteria?

            Removing the SLB label - this epic makes the flow using localnet better, but it is not a blocker

            Petr Horacek added a comment - Removing the SLB label - this epic makes the flow using localnet better, but it is not a blocker

            mduarted@redhat.com physicalNetworkName needs to be mutable. As long as we require users to remember the bridge mapping name (and understand where to find it) and the type it into the UDN, we must be able reconcile it. I also care about MTU and VLAN. About IPAM-releated attributes I care less, although the one to exclude seems important for day-2.

            Petr Horacek added a comment - mduarted@redhat.com physicalNetworkName needs to be mutable. As long as we require users to remember the bridge mapping name (and understand where to find it) and the type it into the UDN, we must be able reconcile it. I also care about MTU and VLAN. About IPAM-releated attributes I care less, although the one to exclude seems important for day-2.

            I don't understand why the Pod won't start in the scenario you described - it would help me to talk about it offline.

            Petr Horacek added a comment - I don't understand why the Pod won't start in the scenario you described - it would help me to talk about it offline.

            phoracek@redhat.com at the NAD level, we accept mutating every attribute (requiring a VM restart).

            For the localnet cluster-UDN, we need to have better defined acceptance criteria for this (in order for us to define which fields are immutable).

            I suggest allowing mutation (at the c-udn CRD) to the following attributes:

            • MTU
            • VLAN
            • excludedIPs / CIDRs
            • physicalNetworkName (maybe !!!)

            Everything should be off limits. We could maybe do a PoC about disabling IPAM though.

            Miguel Duarte de Mora Barroso added a comment - phoracek@redhat.com at the NAD level, we accept mutating every attribute (requiring a VM restart). For the localnet cluster-UDN, we need to have better defined acceptance criteria for this (in order for us to define which fields are immutable). I suggest allowing mutation (at the c-udn CRD) to the following attributes: MTU VLAN excludedIPs / CIDRs physicalNetworkName (maybe !!!) Everything should be off limits. We could maybe do a PoC about disabling IPAM though.

            phoracek@redhat.com regarding the following entry in the acceptance criteria:

            Definition of these networks can be changed even if there are Pods connected to them. When that happens, the UDN is marked as degraded until all the "old" pods are gone

            I just became aware of the following scenario:

            1. Create nad1 in ns1
            2. Create ns2 and a pod pointing to nad2 that shares the same config as nad1
            3. Create nad2
            4. The pod never starts

            Meaning, we're not idempotent for networks that span multiple namespaces.

            IIUC, there's no problem for cluster UDNs, only when NADs are used directly. But, the limitation still stands.

            Miguel Duarte de Mora Barroso added a comment - phoracek@redhat.com regarding the following entry in the acceptance criteria: Definition of these networks can be changed even if there are Pods connected to them. When that happens, the UDN is marked as degraded until all the "old" pods are gone I just became aware of the following scenario: Create nad1 in ns1 Create ns2 and a pod pointing to nad2 that shares the same config as nad1 Create nad2 The pod never starts Meaning, we're not idempotent for networks that span multiple namespaces. IIUC, there's no problem for cluster UDNs, only when NADs are used directly. But, the limitation still stands.

              rravaiol@redhat.com Riccardo Ravaioli
              pdiak@redhat.com Patryk Diak
              Arti Sood Arti Sood
              Jason Boxman Jason Boxman
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

                Created:
                Updated: