-
Bug
-
Resolution: Done
-
Major
-
None
-
4.12.z
-
Important
-
No
-
CNF Network Sprint 242, CNF Network Sprint 243
-
2
-
Rejected
-
False
-
-
Customer Escalated
-
9/18: green
-
Description of problem:
Since the upgrade from OCP 4.10.X to OCP 4.12.29 we begin to see issue on metallb speaker pod. We have to restart speaker pod at a regular interval to restore service. We see the following stack trace related to a SIGSEV on bgpd daemon running on speaker pod. $ for pod in $(omg get pod -A -o wide|grep -i speaker | awk '{print $2}') ; do echo $pod ; omg logs -n metallb-system -c frr $pod |grep -A30 "Received signal 11" ; done|less speaker-98mxj 2023-09-04T14:47:49.455643366Z BGP: Received signal 11 at 1693838869 (si_addr 0x2, PC 0x7fcaec333c75); aborting... 2023-09-04T14:47:49.455940093Z BGP: /usr/lib64/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x5b) [0x7fcaee3c129b] 2023-09-04T14:47:49.455940093Z BGP: /usr/lib64/frr/libfrr.so.0(zlog_signal+0xe1) [0x7fcaee3c1491] 2023-09-04T14:47:49.455940093Z BGP: /usr/lib64/frr/libfrr.so.0(+0x83b28) [0x7fcaee3e5b28] 2023-09-04T14:47:49.455940093Z BGP: /lib64/libpthread.so.0(+0x12ce0) [0x7fcaec63ece0] 2023-09-04T14:47:49.455940093Z BGP: /lib64/libc.so.6(+0xccc75) [0x7fcaec333c75] 2023-09-04T14:47:49.455940093Z BGP: /lib64/libyang.so.1(lydict_remove+0x49) [0x7fcaee079709] 2023-09-04T14:47:49.455952425Z BGP: /lib64/libyang.so.1(lyd_free_attr+0x7b) [0x7fcaee0e11fb] 2023-09-04T14:47:49.455952425Z BGP: /lib64/libyang.so.1(+0x7d0f1) [0x7fcaee0e40f1] 2023-09-04T14:47:49.455952425Z BGP: /lib64/libyang.so.1(+0x7d209) [0x7fcaee0e4209] 2023-09-04T14:47:49.455952425Z BGP: /lib64/libyang.so.1(+0x7d221) [0x7fcaee0e4221] 2023-09-04T14:47:49.455952425Z BGP: /lib64/libyang.so.1(+0x7d221) [0x7fcaee0e4221] 2023-09-04T14:47:49.455952425Z BGP: /lib64/libyang.so.1(+0x7d221) [0x7fcaee0e4221] 2023-09-04T14:47:49.455959853Z BGP: /usr/lib64/frr/libfrr.so.0(nb_config_replace+0x32) [0x7fcaee3ca702] 2023-09-04T14:47:49.455959853Z BGP: /usr/lib64/frr/libfrr.so.0(nb_candidate_commit_apply+0x61) [0x7fcaee3cd3b1] 2023-09-04T14:47:49.455959853Z BGP: /usr/lib64/frr/libfrr.so.0(nb_candidate_commit+0x9e) [0x7fcaee3cd4be] 2023-09-04T14:47:49.455981286Z BGP: /usr/lib64/frr/libfrr.so.0(+0x6b8dc) [0x7fcaee3cd8dc] 2023-09-04T14:47:49.455981286Z BGP: /usr/lib64/frr/libfrr.so.0(nb_cli_apply_changes+0x619) [0x7fcaee3d0959] 2023-09-04T14:47:49.455981286Z BGP: /usr/lib64/frr/libfrr.so.0(+0x4a7c5) [0x7fcaee3ac7c5] 2023-09-04T14:47:49.455981286Z BGP: /usr/lib64/frr/libfrr.so.0(+0x4ab81) [0x7fcaee3acb81] 2023-09-04T14:47:49.455981286Z BGP: /usr/lib64/frr/libfrr.so.0(+0x39525) [0x7fcaee39b525] 2023-09-04T14:47:49.455981286Z BGP: /usr/lib64/frr/libfrr.so.0(cmd_execute_command+0x71) [0x7fcaee39d6f1] 2023-09-04T14:47:49.455981286Z BGP: /usr/lib64/frr/libfrr.so.0(cmd_execute+0xd0) [0x7fcaee39d910] 2023-09-04T14:47:49.455981286Z BGP: /usr/lib64/frr/libfrr.so.0(+0x98da5) [0x7fcaee3fada5] 2023-09-04T14:47:49.455981286Z BGP: /usr/lib64/frr/libfrr.so.0(+0x98f80) [0x7fcaee3faf80] 2023-09-04T14:47:49.455981286Z BGP: /usr/lib64/frr/libfrr.so.0(+0x9b9c0) [0x7fcaee3fd9c0] 2023-09-04T14:47:49.455990616Z BGP: /usr/lib64/frr/libfrr.so.0(thread_call+0x5a) [0x7fcaee3f52aa] 2023-09-04T14:47:49.455990616Z BGP: /usr/lib64/frr/libfrr.so.0(frr_run+0xe8) [0x7fcaee3bfe18] 2023-09-04T14:47:49.456010473Z BGP: /usr/lib/frr/bgpd(main+0x30c) [0x5571a251f9fc] 2023-09-04T14:47:49.456010473Z BGP: /lib64/libc.so.6(__libc_start_main+0xf3) [0x7fcaec2a1cf3] 2023-09-04T14:47:49.456010473Z BGP: /usr/lib/frr/bgpd(_start+0x2e) [0x5571a2521c2e] --- 2023-09-04T14:47:49.459087210Z BFD: Received signal 11 at 1693838869 (si_addr 0xffffffff8daa35f1, PC 0x7f605cf2ac92); aborting... 2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x5b) [0x7f605cf1829b] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(zlog_signal+0xe1) [0x7f605cf18491] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x83b28) [0x7f605cf3cb28] 2023-09-04T14:47:49.459662341Z BFD: /lib64/libpthread.so.0(+0x12ce0) [0x7f605b195ce0] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x71c92) [0x7f605cf2ac92] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x7225d) [0x7f605cf2b25d] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(prefix_list_entry_update_finish+0x6c) [0x7f605cf2d26c] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x4f037) [0x7f605cf08037] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x6acc5) [0x7f605cf23cc5] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x6b06e) [0x7f605cf2406e] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(nb_candidate_commit_apply+0x37) [0x7f605cf24387] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(nb_candidate_commit+0x9e) [0x7f605cf244be] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x6b8dc) [0x7f605cf248dc] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(nb_cli_apply_changes+0x619) [0x7f605cf27959] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x4a7c5) [0x7f605cf037c5] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x4ab81) [0x7f605cf03b81] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x39525) [0x7f605cef2525] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(cmd_execute_command+0x71) [0x7f605cef46f1] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(cmd_execute+0xd0) [0x7f605cef4910] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x98da5) [0x7f605cf51da5] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x98f80) [0x7f605cf51f80] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(+0x9b9c0) [0x7f605cf549c0] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(thread_call+0x5a) [0x7f605cf4c2aa] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib64/frr/libfrr.so.0(frr_run+0xe8) [0x7f605cf16e18] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib/frr/bfdd(main+0x27b) [0x5628836673db] 2023-09-04T14:47:49.459662341Z BFD: /lib64/libc.so.6(__libc_start_main+0xf3) [0x7f605adf8cf3] 2023-09-04T14:47:49.459662341Z BFD: /usr/lib/frr/bfdd(_start+0x2e) [0x5628836675be] 2023-09-04T14:47:49.459662341Z BFD: in thread vtysh_read scheduled from lib/vty.c:2682 2023-09-04T14:47:49.460284519Z 2023/09/04 14:47:49 WATCHFRR: [EC 268435457] bfdd state -> down : read returned EOF 2023-09-04T14:47:49.460405905Z 2023/09/04 14:47:49.460 ZEBRA: [EC 4043309122] Client 'bfd' encountered an error and is shutting down. Counting the number of occurence show the symptom affect near all speaker pods. $ for pod in $(omg get pod -A -o wide|grep -i spea | awk '{print $2}') ; do SIGSEVCOUNT=$(omg logs -n metallb-system -c frr $pod |grep -ci "Received signal 11" ) ; printf "%s %d\n" $pod $SIGSEVCOUNT ; done speaker-2hz45 7 speaker-66wcc 0 speaker-6fjrr 14 speaker-7pnzf 7 speaker-8bqvp 0 speaker-8fprv 0 speaker-98mxj 5 speaker-d4btd 5 speaker-dd95v 13 speaker-dl2x7 3 speaker-gwqg8 11 speaker-kbl82 21 speaker-rcfjx 28 speaker-s6xx5 3 speaker-v6mmk 0 speaker-vnf28 3 speaker-zlkt9 10 Restarting affected pod seem to restore the service for a short period of time : a day or few hours.
Version-Release number of selected component (if applicable):
metallb-operator.4.12.0-202308071502 OCP 4.12.29
How reproducible:
Quite often without touching to anything. Once every day/few days.
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
No crash of bgpd
Additional info:
Look quite similar to : https://issues.redhat.com/browse/OCPBUGS-16795 We open this one to have confirmation that we hit the same issue.
- account is impacted by
-
OCPBUGS-26059 metallb frr logs error processing configuration change: duplicated prefix list value
- Closed
- depends on
-
RHEL-3040 Crash in FRR with MetalLB after reloading the configuration
- Closed
-
OCPBUGS-19053 Duplicate prefix lists in frr
- Closed
- impacts account
-
OCPBUGS-16795 FRR container does not respond intermittently and and log shows BFD crash
- Closed