Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-55245

BGP peers not getting established

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.17, 4.18, 4.19
    • Networking / Metal LB
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      After installing and setting up metallb operator, BGP peers are not displayed in the network appliances and no routes are getting advertised.
          

      Version-Release number of selected component (if applicable):

      4.20, 4.19, 4.18 and 4.17
          

      How reproducible:

      Always
          

      Steps to Reproduce:

          1. Deploy one of the latest version of the OCP mentioned above in baremetal servers
          2. Deploy MetalLB operator and set it up
          3. Verify BGP peers and BGP routes in the network appliances, and there are none.
      
      Note: OCP upgrades from 4.17 to 4.18 also show this issue
          

      Actual results:

      BGP peers not getting established and BGP routes not advertised to network appliance
          

      Expected results:

      BGP peers should get established and BGP routes advertised to network appliance
          

      Additional info:

      - We use NMstate LACP bonding interface at day2 to BGP traffic with dualstack.
      - In 4.18 this is working with metallb-operator.v4.18.0-202504021503
      - In 4.17 last working job was with metallb-operator.v4.17.0-202503251509
          

      This is our baremetal setup:

      $ oc get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.18.10   True        False         94m     Cluster version is 4.18.10
      
      $ oc get nodes
      NAME       STATUS   ROLES                  AGE    VERSION
      master-0   Ready    control-plane,master   155m   v1.31.7
      master-1   Ready    control-plane,master   155m   v1.31.7
      master-2   Ready    control-plane,master   154m   v1.31.7
      worker-0   Ready    worker                 104m   v1.31.7
      worker-1   Ready    worker                 102m   v1.31.7
      worker-2   Ready    worker                 104m   v1.31.7
      worker-3   Ready    worker                 105m   v1.31.7
      
      $ oc -n metallb-system get pods -o wide
      NAME                                                   READY   STATUS    RESTARTS   AGE   IP              NODE       NOMINATED NODE   READINESS GATES
      controller-c7bf456c4-v4d65                             2/2     Running   0          53m   10.129.2.12     worker-2   <none>           <none>
      frr-k8s-4c6sm                                          6/6     Running   0          53m   192.168.12.25   worker-1   <none>           <none>
      frr-k8s-gqhmt                                          6/6     Running   0          53m   192.168.12.27   worker-3   <none>           <none>
      frr-k8s-rs7rk                                          6/6     Running   0          53m   192.168.12.26   worker-2   <none>           <none>
      frr-k8s-wczjz                                          6/6     Running   0          53m   192.168.12.24   worker-0   <none>           <none>
      frr-k8s-webhook-server-fc99ff78-vxgrx                  1/1     Running   0          53m   10.130.2.9      worker-1   <none>           <none>
      metallb-operator-controller-manager-67594c4b87-brxxd   1/1     Running   0          62m   10.129.2.8      worker-2   <none>           <none>
      metallb-operator-webhook-server-7f44b9699c-jb6tm       1/1     Running   0          62m   10.129.2.9      worker-2   <none>           <none>
      speaker-m7mrt                                          2/2     Running   0          53m   192.168.12.27   worker-3   <none>           <none>
      speaker-p8nsr                                          2/2     Running   0          53m   192.168.12.24   worker-0   <none>           <none>
      speaker-tmf2t                                          2/2     Running   0          53m   192.168.12.25   worker-1   <none>           <none>
      speaker-xcjts                                          2/2     Running   0          53m   192.168.12.26   worker-2   <none>           <none>
      
      $ oc -n metallb-system get csv
      NAME                                    DISPLAY                     VERSION               REPLACES                 PHASE
      metallb-operator.v4.18.0-202504090803   MetalLB Operator            4.18.0-202504090803                            Succeeded
      

      These are the errors displayed in the pods of a worker node:

      
      [dciteam@jumphost collection-scripts]$ oc -n metallb-system logs speaker-p8nsr | grep error
      Defaulted container "speaker" out of: speaker, kube-rbac-proxy
      {"caller":"speakerlist.go:259","error":null,"expected":3,"joined":2,"level":"error","msg":"partial join","op":"memberDiscovery","ts":"2025-04-22T20:00:14Z"}
      {"caller":"announcer.go:131","error":"creating NDP responder for \"eno5np0\": listen ip6:ipv6-icmp fe80::f603:43ff:fecd:16e0%eno5np0: bind: cannot assign requested address","interface":"eno5np0","level":"error","msg":"failed to create NDP responder","op":"createNDPResponder","ts":"2025-04-22T20:01:44Z"}
      {"caller":"announcer.go:131","error":"creating NDP responder for \"ens2f0\": listen ip6:ipv6-icmp fe80::3efd:feff:febb:2078%ens2f0: bind: cannot assign requested address","interface":"ens2f0","level":"error","msg":"failed to create NDP responder","op":"createNDPResponder","ts":"2025-04-22T20:01:44Z"}
      {"caller":"announcer.go:131","error":"creating NDP responder for \"ens2f1\": listen ip6:ipv6-icmp fe80::3efd:feff:febb:2079%ens2f1: bind: cannot assign requested address","interface":"ens2f1","level":"error","msg":"failed to create NDP responder","op":"createNDPResponder","ts":"2025-04-22T20:01:44Z"}
      {"caller":"announcer.go:131","error":"creating NDP responder for \"eno6np1\": listen ip6:ipv6-icmp fe80::f603:43ff:fecd:16e8%eno6np1: bind: cannot assign requested address","interface":"eno6np1","level":"error","msg":"failed to create NDP responder","op":"createNDPResponder","ts":"2025-04-22T20:08:14Z"}
      
      [dciteam@jumphost collection-scripts]$ oc -n metallb-system exec daemonset/frr-k8s -c frr -- vtysh -c "show ip bgp summary"
      
      IPv4 Unicast Summary (VRF default):
      BGP router identifier 192.168.16.139, local AS number 65012 vrf-id 0
      BGP table version 1
      RIB entries 1, using 192 bytes of memory
      Peers 8, using 5797 KiB of memory
      
      Neighbor             V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
      192.168.15.2         4      65000         0         0        0    0    0    never         Idle        0 N/A
      192.168.15.3         4      65000         0         0        0    0    0    never         Idle        0 N/A
      192.168.16.130       4      65000         0         0        0    0    0    never         Idle        0 N/A
      192.168.16.131       4      65000         0         0        0    0    0    never         Idle        0 N/A
      fd48:de67:5083:16::2 4      65000         0         0        0    0    0    never         Idle        0 N/A
      fd48:de67:5083:16::3 4      65000         0         0        0    0    0    never         Idle        0 N/A
      fdb0:5b22:e86a:15::2 4      65000         0         0        0    0    0    never         Idle        0 N/A
      fdb0:5b22:e86a:15::3 4      65000         0         0        0    0    0    never         Idle        0 N/A
      
      Total number of neighbors 8
      
      [dciteam@jumphost collection-scripts]$ oc -n metallb-system exec daemonset/frr-k8s -c frr -- vtysh -c "show ip bgp"
      BGP table version is 1, local router ID is 192.168.16.139, vrf id 0
      Default local pref 100, local AS 65012
      Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
                     i internal, r RIB-failure, S Stale, R Removed
      Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
      Origin codes:  i - IGP, e - EGP, ? - incomplete
      RPKI validation codes: V valid, I invalid, N Not found
      
          Network          Next Hop            Metric LocPrf Weight Path
       *> 10.100.1.50/32   0.0.0.0                  0         32768 i
      
      Displayed  1 routes and 1 total paths
      

      On the network appliance no BGP peers or routes are displayed related to this cluster:

      dfw01edge-sp01#show ip bgp summary
      BGP summary information for VRF default
      Router identifier 192.168.103.2, local AS number 65000
      Neighbor Status Codes: m - Under maintenance
        Neighbor         V  AS           MsgRcvd   MsgSent  InQ OutQ  Up/Down State   PfxRcd PfxAcc
        192.168.5.30     4  65001         112699    112754    0    0    7d05h Estab   2      2
      dfw01edge-sp01#show ip bgp
      BGP routing table information for VRF default
      Router identifier 192.168.103.2, local AS number 65000
      Route status codes: s - suppressed, * - valid, > - active, # - not installed, E - ECMP head, e - ECMP
                          S - Stale, c - Contributing to ECMP, b - backup, L - labeled-unicast
      Origin codes: i - IGP, e - EGP, ? - incomplete
      AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop
      
               Network                Next Hop            Metric  LocPref Weight  Path
       * >     10.10.0.0/24           192.168.5.30          0       100     0       65001 i
       * >     10.20.0.0/24           192.168.5.30          0       100     0       65001 i
      dfw01edge-sp01#show ip route bgp
      
      VRF: default
      Codes: C - connected, S - static, K - kernel, 
             O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
             E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
             N2 - OSPF NSSA external type2, B - BGP, B I - iBGP, B E - eBGP,
             R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
             O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
             NG - Nexthop Group Static Route, V - VXLAN Control Service,
             DH - DHCP client installed default route, M - Martian,
             DP - Dynamic Policy Route, L - VRF Leaked
      
       B E      10.10.0.0/24 [200/0] via 192.168.5.30, Vlan10
       B E      10.20.0.0/24 [200/0] via 192.168.5.30, Vlan10
      
      

              obraunsh@redhat.com Ori Braunshtein
              rhn-gps-manrodri Manuel Rodriguez
              None
              None
              Evgeny Levin Evgeny Levin
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: