-
Bug
-
Resolution: Done
-
Major
-
None
-
4.12.z
-
Incidents & Support
-
False
-
-
None
-
Important
-
No
-
9/18: green
-
None
-
None
-
Rejected
-
CNF Network Sprint 242, CNF Network Sprint 243
-
2
-
Customer Escalated
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Since the upgrade from OCP 4.10.X to OCP 4.12.29 we begin to see issue on metallb speaker pod.
We have to restart speaker pod at a regular interval to restore service.
We see the following stack trace related to a SIGSEV on bgpd daemon running on speaker pod.
$ for pod in $(omg get pod -A -o wide|grep -i speaker | awk '{print $2}') ; do echo $pod ; omg logs -n metallb-system -c frr $pod |grep -A30 "Received signal 11" ; done|less
speaker-98mxj
2023-09-04T14:47:49.455643366Z BGP: Received signal 11 at 1693838869 (si_addr 0x2, PC 0x7fcaec333c75); aborting...
2023-09-04T14:47:49.455940093Z BGP: /usr/lib64/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x5b) [0x7fcaee3c129b]
2023-09-04T14:47:49.455940093Z BGP: /usr/lib64/frr/libfrr.so.0(zlog_signal+0xe1) [0x7fcaee3c1491]
2023-09-04T14:47:49.455940093Z BGP: /usr/lib64/frr/libfrr.so.0(+0x83b28) [0x7fcaee3e5b28]
2023-09-04T14:47:49.455940093Z BGP: /lib64/libpthread.so.0(+0x12ce0) [0x7fcaec63ece0]
2023-09-04T14:47:49.455940093Z BGP: /lib64/libc.so.6(+0xccc75) [0x7fcaec333c75]
2023-09-04T14:47:49.455940093Z BGP: /lib64/libyang.so.1(lydict_remove+0x49) [0x7fcaee079709]
2023-09-04T14:47:49.455952425Z BGP: /lib64/libyang.so.1(lyd_free_attr+0x7b) [0x7fcaee0e11fb]
2023-09-04T14:47:49.455952425Z BGP: /lib64/libyang.so.1(+0x7d0f1) [0x7fcaee0e40f1]
2023-09-04T14:47:49.455952425Z BGP: /lib64/libyang.so.1(+0x7d209) [0x7fcaee0e4209]
2023-09-04T14:47:49.455952425Z BGP: /lib64/libyang.so.1(+0x7d221) [0x7fcaee0e4221]
2023-09-04T14:47:49.455952425Z BGP: /lib64/libyang.so.1(+0x7d221) [0x7fcaee0e4221]
2023-09-04T14:47:49.455952425Z BGP: /lib64/libyang.so.1(+0x7d221) [0x7fcaee0e4221]
2023-09-04T14:47:49.455959853Z BGP: /usr/lib64/frr/libfrr.so.0(nb_config_replace+0x32) [0x7fcaee3ca702]
2023-09-04T14:47:49.455959853Z BGP: /usr/lib64/frr/libfrr.so.0(nb_candidate_commit_apply+0x61) [0x7fcaee3cd3b1]
2023-09-04T14:47:49.455959853Z BGP: /usr/lib64/frr/libfrr.so.0(nb_candidate_commit+0x9e) [0x7fcaee3cd4be]
2023-09-04T14:47:49.455981286Z BGP: /usr/lib64/frr/libfrr.so.0(+0x6b8dc) [0x7fcaee3cd8dc]
2023-09-04T14:47:49.455981286Z BGP: /usr/lib64/frr/libfrr.so.0(nb_cli_apply_changes+0x619) [0x7fcaee3d0959]
2023-09-04T14:47:49.455981286Z BGP: /usr/lib64/frr/libfrr.so.0(+0x4a7c5) [0x7fcaee3ac7c5]
2023-09-04T14:47:49.455981286Z BGP: /usr/lib64/frr/libfrr.so.0(+0x4ab81) [0x7fcaee3acb81]
2023-09-04T14:47:49.455981286Z BGP: /usr/lib64/frr/libfrr.so.0(+0x39525) [0x7fcaee39b525]
2023-09-04T14:47:49.455981286Z BGP: /usr/lib64/frr/libfrr.so.0(cmd_execute_command+0x71) [0x7fcaee39d6f1]
2023-09-04T14:47:49.455981286Z BGP: /usr/lib64/frr/libfrr.so.0(cmd_execute+0xd0) [0x7fcaee39d910]
2023-09-04T14:47:49.455981286Z BGP: /usr/lib64/frr/libfrr.so.0(+0x98da5) [0x7fcaee3fada5]
2023-09-04T14:47:49.455981286Z BGP: /usr/lib64/frr/libfrr.so.0(+0x98f80) [0x7fcaee3faf80]
2023-09-04T14:47:49.455981286Z BGP: /usr/lib64/frr/libfrr.so.0(+0x9b9c0) [0x7fcaee3fd9c0]
2023-09-04T14:47:49.455990616Z BGP: /usr/lib64/frr/libfrr.so.0(thread_call+0x5a) [0x7fcaee3f52aa]
2023-09-04T14:47:49.455990616Z BGP: /usr/lib64/frr/libfrr.so.0(frr_run+0xe8) [0x7fcaee3bfe18]
2023-09-04T14:47:49.456010473Z BGP: /usr/lib/frr/bgpd(main+0x30c) [0x5571a251f9fc]
2023-09-04T14:47:49.456010473Z BGP: /lib64/libc.so.6(__libc_start_main+0xf3) [0x7fcaec2a1cf3]
2023-09-04T14:47:49.456010473Z BGP: /usr/lib/frr/bgpd(_start+0x2e) [0x5571a2521c2e]
---
2023-09-04T14:47:49.459087210Z BFD: Received signal 11 at 1693838869 (si_addr 0xffffffff8daa35f1, PC 0x7f605cf2ac92); aborting...
2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x5b) [0x7f605cf1829b]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(zlog_signal+0xe1) [0x7f605cf18491]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x83b28) [0x7f605cf3cb28]
2023-09-04T14:47:49.459662341Z BFD: /lib64/libpthread.so.0(+0x12ce0) [0x7f605b195ce0]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x71c92) [0x7f605cf2ac92]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x7225d) [0x7f605cf2b25d]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(prefix_list_entry_update_finish+0x6c) [0x7f605cf2d26c]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x4f037) [0x7f605cf08037]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x6acc5) [0x7f605cf23cc5]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x6b06e) [0x7f605cf2406e]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(nb_candidate_commit_apply+0x37) [0x7f605cf24387]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(nb_candidate_commit+0x9e) [0x7f605cf244be]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x6b8dc) [0x7f605cf248dc]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(nb_cli_apply_changes+0x619) [0x7f605cf27959]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x4a7c5) [0x7f605cf037c5]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x4ab81) [0x7f605cf03b81]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x39525) [0x7f605cef2525]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(cmd_execute_command+0x71) [0x7f605cef46f1]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(cmd_execute+0xd0) [0x7f605cef4910]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x98da5) [0x7f605cf51da5]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x98f80) [0x7f605cf51f80]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x9b9c0) [0x7f605cf549c0]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(thread_call+0x5a) [0x7f605cf4c2aa]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(frr_run+0xe8) [0x7f605cf16e18]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib/frr/bfdd(main+0x27b) [0x5628836673db]
2023-09-04T14:47:49.459662341Z BFD: /lib64/libc.so.6(__libc_start_main+0xf3) [0x7f605adf8cf3]
2023-09-04T14:47:49.459662341Z BFD: /usr/lib/frr/bfdd(_start+0x2e) [0x5628836675be]
2023-09-04T14:47:49.459662341Z BFD: in thread vtysh_read scheduled from lib/vty.c:2682
2023-09-04T14:47:49.460284519Z 2023/09/04 14:47:49 WATCHFRR: [EC 268435457] bfdd state -> down : read returned EOF
2023-09-04T14:47:49.460405905Z 2023/09/04 14:47:49.460 ZEBRA: [EC 4043309122] Client 'bfd' encountered an error and is shutting down.
Counting the number of occurence show the symptom affect near all speaker pods.
$ for pod in $(omg get pod -A -o wide|grep -i spea | awk '{print $2}') ; do SIGSEVCOUNT=$(omg logs -n metallb-system -c frr $pod |grep -ci "Received signal 11" ) ; printf "%s %d\n" $pod $SIGSEVCOUNT ; done
speaker-2hz45 7
speaker-66wcc 0
speaker-6fjrr 14
speaker-7pnzf 7
speaker-8bqvp 0
speaker-8fprv 0
speaker-98mxj 5
speaker-d4btd 5
speaker-dd95v 13
speaker-dl2x7 3
speaker-gwqg8 11
speaker-kbl82 21
speaker-rcfjx 28
speaker-s6xx5 3
speaker-v6mmk 0
speaker-vnf28 3
speaker-zlkt9 10
Restarting affected pod seem to restore the service for a short period of time : a day or few hours.
Version-Release number of selected component (if applicable):
metallb-operator.4.12.0-202308071502 OCP 4.12.29
How reproducible:
Quite often without touching to anything. Once every day/few days.
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
No crash of bgpd
Additional info:
Look quite similar to : https://issues.redhat.com/browse/OCPBUGS-16795 We open this one to have confirmation that we hit the same issue.
- account is impacted by
-
OCPBUGS-26059 metallb frr logs error processing configuration change: duplicated prefix list value
-
- Closed
-
- depends on
-
RHEL-3040 Crash in FRR with MetalLB after reloading the configuration
-
- Closed
-
-
OCPBUGS-19053 Duplicate prefix lists in frr
-
- Closed
-
- impacts account
-
OCPBUGS-16795 FRR container does not respond intermittently and and log shows BFD crash
-
- Closed
-