VMs in customer's RHOSP deployment lost network connectivity during FFU 16.2 (ovn-2021) -> 17.1 (ovn23.09) procedure: after ovn-controller container was updated and restarted it failed to establish relationships with control plane. On all nodes running ovn-controller process same sequence of log messages is observed:
2024-06-10T18:20:22.655Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovn-controller.log 2024-06-10T18:20:22.655Z|00002|reconnect|INFO|unix:/run/openvswitch/db.sock: connecting... 2024-06-10T18:20:22.655Z|00003|reconnect|INFO|unix:/run/openvswitch/db.sock: connected 2024-06-10T18:20:22.662Z|00004|main|INFO|OVN internal version is : [22.12.2-20.27.0-70.6] 2024-06-10T18:20:22.662Z|00005|main|INFO|OVS IDL reconnected, force recompute. 2024-06-10T18:20:22.775Z|00006|reconnect|INFO|ssl:IP_ADDR:6642: connecting... 2024-06-10T18:20:22.775Z|00007|main|INFO|OVNSB IDL reconnected, force recompute. 2024-06-10T18:20:23.776Z|00008|reconnect|INFO|ssl:IP_ADDR:6642: connection attempt timed out 2024-06-10T18:20:24.777Z|00009|reconnect|INFO|ssl:IP_ADDR:6642: connecting... 2024-06-10T18:20:25.779Z|00010|reconnect|INFO|ssl:IP_ADDR:6642: connection attempt timed out 2024-06-10T18:20:25.779Z|00011|reconnect|INFO|ssl:IP_ADDR:6642: waiting 2 seconds before reconnect 2024-06-10T18:20:27.782Z|00012|reconnect|INFO|ssl:IP_ADDR:6642: connecting... 2024-06-10T18:20:29.785Z|00013|reconnect|INFO|ssl:IP_ADDR:6642: connection attempt timed out 2024-06-10T18:20:29.785Z|00014|reconnect|INFO|ssl:IP_ADDR:6642: waiting 4 seconds before reconnect 2024-06-10T18:20:33.789Z|00015|memory|INFO|11080 kB peak resident set size after 11.1 seconds 2024-06-10T18:20:33.789Z|00016|memory|INFO|idl-cells-Open_vSwitch:8014 2024-06-10T18:20:33.790Z|00017|reconnect|INFO|ssl:IP_ADDR:6642: connecting... 2024-06-10T18:20:37.795Z|00018|reconnect|INFO|ssl:IP_ADDR:6642: connection attempt timed out 2024-06-10T18:20:37.795Z|00019|reconnect|INFO|ssl:IP_ADDR:6642: continuing to reconnect in the background but suppressing further logging 2024-06-10T18:21:06.243Z|00020|reconnect|INFO|ssl:IP_ADDR:6642: connected 2024-06-10T18:21:06.251Z|00021|ovsdb_idl|WARN|OVN_Southbound database lacks Chassis_Template_Var table (database needs upgrade?) 2024-06-10T18:21:06.251Z|00022|ovsdb_idl|WARN|Load_Balancer table in OVN_Southbound database lacks datapath_group column (database needs upgrade?) 2024-06-10T18:21:06.251Z|00023|ovsdb_idl|WARN|MAC_Binding table in OVN_Southbound database lacks timestamp column (database needs upgrade?) 2024-06-10T18:21:06.251Z|00024|ovsdb_idl|WARN|OVN_Southbound database lacks Mirror table (database needs upgrade?) 2024-06-10T18:21:06.251Z|00025|ovsdb_idl|WARN|Port_Binding table in OVN_Southbound database lacks additional_chassis column (database needs upgrade?) 2024-06-10T18:21:06.251Z|00026|ovsdb_idl|WARN|Port_Binding table in OVN_Southbound database lacks additional_encap column (database needs upgrade?) 2024-06-10T18:21:06.251Z|00027|ovsdb_idl|WARN|Port_Binding table in OVN_Southbound database lacks mirror_rules column (database needs upgrade?) 2024-06-10T18:21:06.251Z|00028|ovsdb_idl|WARN|Port_Binding table in OVN_Southbound database lacks port_security column (database needs upgrade?) 2024-06-10T18:21:06.251Z|00029|ovsdb_idl|WARN|Port_Binding table in OVN_Southbound database lacks requested_additional_chassis column (database needs upgrade?) 2024-06-10T18:21:06.251Z|00030|ovsdb_idl|WARN|OVN_Southbound database lacks Static_MAC_Binding table (database needs upgrade?) 2024-06-10T18:21:12.362Z|00031|memory|INFO|peak resident set size grew 217% in last 38.6 seconds, from 11080 kB to 35104 kB 2024-06-10T18:21:12.362Z|00032|memory|INFO|idl-cells-Open_vSwitch:8014 2024-06-10T18:21:20.899Z|00033|features|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch 2024-06-10T18:21:20.900Z|00034|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting... 2024-06-10T18:21:20.908Z|00035|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected 2024-06-10T18:21:20.911Z|00036|features|INFO|OVS Feature: ct_zero_snat, state: supported 2024-06-10T18:21:20.912Z|00037|main|INFO|OVS feature set changed, force recompute. 2024-06-10T18:21:20.913Z|00038|ofctrl|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch 2024-06-10T18:21:20.915Z|00039|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting... 2024-06-10T18:21:20.917Z|00040|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected 2024-06-10T18:21:20.918Z|00041|ofctrl|INFO|ofctrl-wait-before-clear is now 8000 ms (was 0 ms) 2024-06-10T18:21:20.921Z|00042|main|INFO|OVS OpenFlow connection reconnected,force recompute. 2024-06-10T18:21:20.965Z|00001|pinctrl(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch 2024-06-10T18:21:20.965Z|00002|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting... 2024-06-10T18:21:20.965Z|00003|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected 2024-06-10T18:21:20.989Z|00043|timeval|WARN|Unreasonably long 1128ms poll interval (923ms user, 198ms system) 2024-06-10T18:21:20.989Z|00044|timeval|WARN|faults: 88544 minor, 0 major 2024-06-10T18:21:20.990Z|00045|timeval|WARN|disk: 0 reads, 8 writes 2024-06-10T18:21:20.991Z|00046|timeval|WARN|context switches: 0 voluntary, 10 involuntary 2024-06-10T18:21:20.994Z|00047|coverage|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour, hash=97cf558d: 2024-06-10T18:21:20.994Z|00048|coverage|INFO|cmap_expand 0.0/sec 0.000/sec 0.0000/sec total: 3 2024-06-10T18:21:20.994Z|00049|coverage|INFO|netlink_sent 0.0/sec 0.000/sec 0.0000/sec total: 12 2024-06-10T18:21:20.994Z|00050|coverage|INFO|netlink_recv_jumbo 0.0/sec 0.000/sec 0.0000/sec total: 1 2024-06-10T18:21:20.994Z|00051|coverage|INFO|netlink_received 0.0/sec 0.000/sec 0.0000/sec total: 12 2024-06-10T18:21:20.994Z|00052|coverage|INFO|vconn_sent 0.0/sec 0.000/sec 0.0000/sec total: 5 2024-06-10T18:21:20.994Z|00053|coverage|INFO|vconn_received 0.0/sec 0.000/sec 0.0000/sec total: 4 2024-06-10T18:21:20.994Z|00054|coverage|INFO|vconn_open 0.0/sec 0.000/sec 0.0000/sec total: 3 2024-06-10T18:21:20.994Z|00055|coverage|INFO|util_xalloc 396182.8/sec 53981.183/sec 899.6864/sec total: 5873685 2024-06-10T18:21:20.994Z|00056|coverage|INFO|stream_open 0.0/sec 0.117/sec 0.0019/sec total: 10 2024-06-10T18:21:20.994Z|00057|coverage|INFO|pstream_open 0.0/sec 0.017/sec 0.0003/sec total: 1 2024-06-10T18:21:20.994Z|00058|coverage|INFO|seq_change 22.8/sec 3.400/sec 0.0567/sec total: 243 2024-06-10T18:21:20.994Z|00059|coverage|INFO|rconn_sent 0.0/sec 0.000/sec 0.0000/sec total: 2 2024-06-10T18:21:20.994Z|00060|coverage|INFO|rconn_queued 0.0/sec 0.000/sec 0.0000/sec total: 2 2024-06-10T18:21:20.994Z|00061|coverage|INFO|poll_zero_timeout 22.0/sec 2.833/sec 0.0472/sec total: 196 2024-06-10T18:21:20.994Z|00062|coverage|INFO|poll_create_node 69.2/sec 10.300/sec 0.1717/sec total: 719 2024-06-10T18:21:20.994Z|00063|coverage|INFO|txn_incomplete 0.0/sec 0.000/sec 0.0000/sec total: 2 2024-06-10T18:21:20.994Z|00064|coverage|INFO|txn_unchanged 22.8/sec 3.267/sec 0.0544/sec total: 223 2024-06-10T18:21:20.994Z|00065|coverage|INFO|hmap_reserve 0.0/sec 0.000/sec 0.0000/sec total: 76 2024-06-10T18:21:20.994Z|00066|coverage|INFO|hmap_expand 8996.4/sec 1176.017/sec 19.6003/sec total: 91957 2024-06-10T18:21:20.994Z|00067|coverage|INFO|hmap_pathological 0.4/sec 0.817/sec 0.0136/sec total: 257 2024-06-10T18:21:20.994Z|00068|coverage|INFO|miniflow_malloc 0.0/sec 0.000/sec 0.0000/sec total: 272 2024-06-10T18:21:20.994Z|00069|coverage|INFO|physical_run 0.0/sec 0.000/sec 0.0000/sec total: 1 2024-06-10T18:21:20.994Z|00070|coverage|INFO|consider_logical_flow 0.0/sec 0.000/sec 0.0000/sec total: 70000 2024-06-10T18:21:20.994Z|00071|coverage|INFO|lflow_run 0.0/sec 0.000/sec 0.0000/sec total: 1 2024-06-10T18:21:20.994Z|00072|coverage|INFO|116 events never hit 2024-06-10T18:21:25.915Z|00073|memory|INFO|peak resident set size grew 653% in last 13.6 seconds, from 35104 kB to 264164 kB 2024-06-10T18:21:25.915Z|00074|memory|INFO|idl-cells-OVN_Southbound:862796 idl-cells-Open_vSwitch:8014 ofctrl_desired_flow_usage-KB:65 ofctrl_sb_flow_ref_usage-KB:20 2024-06-10T18:21:52.209Z|00004|fatal_signal(ovn_pinctrl0)|WARN|terminating with signal 15 (Terminated)