Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-23025

LoadBalancer is attached to provider LogicalSwitch (localnet), causing first-SYN drops and delay

XMLWordPrintable

    • VANS Bugs to review, VANS-29 Christmas Edition, VANS-28
    • 3
    • Moderate

      When creating a Load Balancer with the OVN provider, the LB gets associated not only to the tenant LogicalSwitch but also to the provider LogicalSwitch that has a localnet port (VLAN 100). With this association present, traffic arriving via the Floating IP (FIP) experiences repeated first-SYN drops and/or ~1s latency. Detaching the LB from the provider LS immediately removes the symptom.

      Pre-conditions / Environment

        • Provider network (external): 10.20.10.0/24 on VLAN 100, OVN localnet port present*
        • Tenant network (internal): 10.100.100.0/24 (Geneve)*
        • Router*
            - Internal interface: 10.100.100.1/24 (tenant)
            - External gateway: 10.20.10.x/24 (provider, SNAT enabled)
        • Load Balancer (OVN provider)*
            - VIP: 10.100.100.X (on tenant subnet)
            - FIP: 10.20.10.Y (on provider network)
            - Member: 10.20.10.Z:<PORT> (member IP belongs to provider network)

      To Reproduce Steps to reproduce the behavior:

      1. Create LB / VIP / Listener / Pool (VIP on tenant subnet):

      TENANT_SUBNET_ID=<tenant-subnet-uuid> # 10.100.100.0/24
      PROVIDER_SUBNET_ID=<provider-subnet-uuid> # 10.20.10.0/24 (VLAN 100)
      LB_NAME=lb-test

      openstack loadbalancer create --provider ovn --vip-subnet-id $TENANT_SUBNET_ID --wait $LB_NAME
      openstack loadbalancer listener create --protocol TCP --protocol-port 80 --wait ${LB_NAME}-lis $LB_NAME
      openstack loadbalancer pool create --listener ${LB_NAME}-lis --protocol TCP --lb-algorithm SOURCE_IP_PORT --wait ${LB_NAME}-pool

      2. Register a member on the provider subnet:

      openstack loadbalancer member create \
      --subnet-id $PROVIDER_SUBNET_ID \
      --address 10.20.10.Z \
      --protocol-port <PORT> \
      --wait \
      ${LB_NAME}-pool

      3. Associate a FIP to the VIP port:

      VIP_PORT_ID=$(openstack loadbalancer show $LB_NAME -f value -c vip_port_id)
      openstack floating ip set --port $VIP_PORT_ID 10.20.10.Y

      4. Inspect LB associations:
      ovn-nbctl list load_balancer <LB-UUID> | egrep 'lr_ref|ls_refs|vips|external_ids'

      Expected behavior

      • With VIP on tenant and FIP on provider, the LB should be associated to the logical router (and, if needed, the tenant LS), but not to the provider LS that contains a localnet port. NAT/LB should effectively happen at the LR path so that FIP flows are consistent.

      Actual result

      • external_ids:ls_refs of the LB includes both the tenant LS and the provider LS (VLAN 100, localnet).
      • Requests from an external client to the FIP (10.20.10.Y:80) show first-SYN drops or ~1s delay.
      • Workaround: removing only the provider LS association makes the issue vanish instantly:

      ovn-nbctl ls-lb-del neutron-<provider-net-uuid> <LB-UUID>

      Comparative evidence

      • Another LB in the same environment that is not associated to the provider LS shows no first-SYN drop/latency under identical FIP topology. The behavior correlates with the LB–LS association (provider LS present vs. absent).

       

              ashtempl Arkady Shtempler
              froyo@redhat.com Fernando Royo
              Arkady Shtempler, Fernando Royo
              Arkady Shtempler Arkady Shtempler
              rhos-dfg-networking-squad-vans
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: