Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-32245

[OVN][METALLB] Load Balancer services IP advertise with L2 on secondary interfaces only not always work

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Major Major
    • None
    • 4.13.z, 4.14.z, 4.15
    • Networking / Metal LB
    • None
    • Moderate
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Since OCP 4.13 we introduced a new feature in MetalLB giving the possibility to make L2 advertisements on additional interfaces, however this seems to have several limitations which is the reason I'm opening this ticket. During the tests I have been doing, it doesn't seem to exist an issue necessarily, but if that is the case, then we will need to move this ticket to documentation, since I think we should be more clear on what exactly customers can expect from this L2advertisement.
      
      Due to how OVN handles the traffic for LoadBalancer services, even with routing via host enabled,  MetalLB with L2Adv on additional interfaces that are not part of the main node network, don't seem to work on more complex network and routing implementations that you can do on the OS level. With OCP 4.14 we introduced new features  to handle symmetric routing, but this seems to be limited to the use of BGP. If this is true, then I think we should definitely have a note informing this to the customers, because not all customers have such protocol implemented in their network and/or for many customers is not worth it when you can have a simpler method to advertise Load Balancer IPs.
      
      But when using L2Adv on secondary interfaces it seems that connections will work for VLAN or directly connected routes.  Otherwise the it either needs a lot of work in the infrastructure to achieve proper routing to and back for the respective service or it is simply not supported from our side.
      Even using egressService on OCP 4.14 to ensure that traffic will use the respective network  on the respective routing table, when we do traceroute to the loadBalancerIp we get a response from the br-ex.
      This of course taking implying that everything in the node is configured for the secondary interfaces, like the example here [1].
      
      When I did some captures I see the traffic arrives at the node and it is DNATted to the ClusterIP, but then just seems to get lost on its way out and the connections gets stuck in TCP retransmissions.
      
      [1] https://access.redhat.com/solutions/19596
      
          

      Version-Release number of selected component (if applicable):

      OCP 4.13+ with OVN-Kubernetes
          

      How reproducible:

      Depending on the network implementation.
          

            fpaoline@redhat.com Federico Paolinelli
            rhn-support-andcosta Andre Costa
            Arti Sood Arti Sood
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: