-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
rhos-18.0.15
-
None
This was reported while testing ovn-bgp-agent with evpn (which is not supported, but the issue related to frr is legitimate) and frr. At some point, the frr container fails and needs to be restarted.
The frr container running on the EDPM nodes is restarted periodically but all the restarts fail due to:
Can't bind zserv socket on (null): Address already in use Cannot bind path /var/run/frr/bgpd.vty: Address already in use rm: cannot remove '/var/run/frr/bgpd.pid': Permission denied
More info about the scenario can be found here:
https://github.com/marbindrakon/ovn-bgp-agent-nat-tester/blob/main/CLAUDE_TROUBLESHOOTING.md#failure-mode-2-frr-crash-loop-bgpd-down
I think the loop of failures could be fixed by removing the .vty and .pid files before the frr process is started, when the container starts:
https://github.com/openstack-k8s-operators/edpm-ansible/blob/main/roles/edpm_frr/files/kolla_config/frr.yaml
Workaround:
sudo systemctl stop edpm_frr sleep 3 sudo systemctl start edpm_frr