Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-874

[ovn-ic] Add support for spine-leaf topology for logical switches, i.e. direct switch-switch connection

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • OVN
    • 13
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide

      Given a Kubernetes cluster with multiple nodes and pods using OVN for networking where each node has its own local switch and is connected to a central transit switch using switch-to-switch ports,

      When the cluster is configured with the new spine-leaf topology where each node’s local switch only handles local pods and connects to the transit switch for cross-node traffic,

      Then adding new pods should only add logical switch ports to the local switch on the node where the pod is deployed, without affecting other node switches. The system should support up to 32K pods per node instead of being limited by the transit switch. Multicast traffic should be managed locally by each node switch and forwarded to other nodes only when necessary.

      Show
      Given a Kubernetes cluster with multiple nodes and pods using OVN for networking where each node has its own local switch and is connected to a central transit switch using switch-to-switch ports, When the cluster is configured with the new spine-leaf topology where each node’s local switch only handles local pods and connects to the transit switch for cross-node traffic, Then adding new pods should only add logical switch ports to the local switch on the node where the pod is deployed, without affecting other node switches. The system should support up to 32K pods per node instead of being limited by the transit switch. Multicast traffic should be managed locally by each node switch and forwarded to other nodes only when necessary.
    • rhel-sst-network-fastdatapath
    • ssg_networking

      In the UDN L2 topology of ovn-kubernetes all the pods of the same network are attached to the same transit switch. Meaning, this switch has one LSP for each pod from this particular node and one remote LSP for each pod on every other node. So, in a 10-node cluster with 100 pods per node, each node will have 1000 ports in the transit logical switch - 100 local and 900 remote ones.

      This topology has limitations:

      1. Adding a new pod anywhere in the cluster means adding a new port to every node in the cluster.
      2. OVN supports up to 32K ports in a single switch, i.e. single network can't have more than 32K pods, which is a serious limitation.
      3. Multicast traffic can't be delivered to more than a few hundred ports on such a large switch, because of resubmit limits.

      Potential solution is to split this one large switch into a spine-leaf topology by having a smaller switch per node that only has local LSPs and a transit switch that connects all these node switches together. Switches should be connected via "localnet"-style switch-switch port that do not exist in OVN today and are the main new feature here.
      Such topology should allow:

      1. Locality of changes: Adding new pods should only require adding LSPs to a single node switch on the same node where the pod is starting.
      2. Higher number of pods in the cluster: Now we could create 32K pods per node and not across the whole cluster.
      3. Better handling of multicast traffic: Pipeline will be split across nodes, so every node will process multicast for its own pods and forward traffic to appropriate remote nodes if necessary.
      4. Still an L2 topology without any routers involved for EW traffic.

      Since nodes will not know MACs of all the pods in the cluster, they will use ARP/ND to discover them as switches with localnet ports do.

      All in all, the logic is similar to just having node switches with localnet ports, but a separate OVN Logical Switch plays the role of a physical provider network instead.

        AZ1                AZ2               AZ1               AZ2
        ---                ---               ---               ---
      
        GR1                GR2               GR1               GR2
         |                  |                 |                 |
       +-+------------------+-+               |  +-----------+  |
       |          TS          |               |  |    TS     |  |
       +-+---+----------+---+-+     =>        |  +-+-------+-+  |
         |   |          |   |                 |    |       |    |
       VIF11 |          | VIF22             +-+----+-+   +-+----+-+
           VIF12      VIF21                 |   LS   |   |   LS   |
                                            +-+---+--+   +--+---+-+
                                              |   |         |   |
                                            VIF11 |         | VIF22
                                                VIF12     VIF21
      

      (Not sure if gateway router should connect to node switches or the transit one, but it's the same L2 domain anyway, so should be fine either way.)

              ovnteam@redhat.com OVN Team
              imaximet@redhat.com Ilya Maximets
              Jianlin Shi Jianlin Shi
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: