• Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • rhos-18.0.15
    • edpm-ansible
    • None
    • 0
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • rhos-connectivity-neutron
    • None
    • Important

      This was reported while testing ovn-bgp-agent with evpn (which is not supported, but the issue related to frr is legitimate) and frr. At some point, the frr container fails and needs to be restarted. 

      The frr container running on the EDPM nodes is restarted periodically but all the restarts fail due to:

       

      Can't bind zserv socket on (null): Address already in use
      Cannot bind path /var/run/frr/bgpd.vty: Address already in use
      rm: cannot remove '/var/run/frr/bgpd.pid': Permission denied

       

      More info about the scenario can be found here:
      https://github.com/marbindrakon/ovn-bgp-agent-nat-tester/blob/main/CLAUDE_TROUBLESHOOTING.md#failure-mode-2-frr-crash-loop-bgpd-down

       

      I think the loop of failures could be fixed by removing the .vty and .pid files before the frr process is started, when the container starts:
      https://github.com/openstack-k8s-operators/edpm-ansible/blob/main/roles/edpm_frr/files/kolla_config/frr.yaml

       

      Workaround:

       

      sudo systemctl stop edpm_frr
      sleep 3
      sudo systemctl start edpm_frr

       

              eolivare Eduardo Olivares Toledo
              eolivare Eduardo Olivares Toledo
              rhos-dfg-networking-squad-neutron
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: