In the UDN L2 topology of ovn-kubernetes all the pods of the same network are attached to the same transit switch. Meaning, this switch has one LSP for each pod from this particular node and one remote LSP for each pod on every other node. So, in a 10-node cluster with 100 pods per node, each node will have 1000 ports in the transit logical switch - 100 local and 900 remote ones.
This topology has limitations:
- Adding a new pod anywhere in the cluster means adding a new port to every node in the cluster.
- OVN supports up to 32K ports in a single switch, i.e. single network can't have more than 32K pods, which is a serious limitation.
- Multicast traffic can't be delivered to more than a few hundred ports on such a large switch, because of resubmit limits.
Potential solution is to split this one large switch into a spine-leaf topology by having a smaller switch per node that only has local LSPs and a transit switch that connects all these node switches together. Switches should be connected via "localnet"-style switch-switch port that do not exist in OVN today and are the main new feature here.
Such topology should allow:
- Locality of changes: Adding new pods should only require adding LSPs to a single node switch on the same node where the pod is starting.
- Higher number of pods in the cluster: Now we could create 32K pods per node and not across the whole cluster.
- Better handling of multicast traffic: Pipeline will be split across nodes, so every node will process multicast for its own pods and forward traffic to appropriate remote nodes if necessary.
- Still an L2 topology without any routers involved for EW traffic.
Since nodes will not know MACs of all the pods in the cluster, they will use ARP/ND to discover them as switches with localnet ports do.
All in all, the logic is similar to just having node switches with localnet ports, but a separate OVN Logical Switch plays the role of a physical provider network instead.
AZ1 AZ2 AZ1 AZ2 --- --- --- --- GR1 GR2 GR1 GR2 | | | | +-+------------------+-+ | +-----------+ | | TS | | | TS | | +-+---+----------+---+-+ => | +-+-------+-+ | | | | | | | | | VIF11 | | VIF22 +-+----+-+ +-+----+-+ VIF12 VIF21 | LS | | LS | +-+---+--+ +--+---+-+ | | | | VIF11 | | VIF22 VIF12 VIF21
(Not sure if gateway router should connect to node switches or the transit one, but it's the same L2 domain anyway, so should be fine either way.)