Cold boot AWS host => Try to start VMs - weird error about MTUs missing from ostestbm network => virsh net-list shows ostestbm is up => Retrying to start VMs works Hop onto master-0 => sudo pcs status --full Cluster name: TNF Cluster Summary: * Stack: corosync (Pacemaker is running) * Current DC: master-0 (1) (version 2.1.9-1.2.el9_6-49aab9983) - partition with quorum * Last updated: Mon Sep 22 15:50:26 2025 on master-0 * Last change: Fri Sep 19 18:06:29 2025 by root via root on master-1 * 2 nodes configured * 6 resource instances configured Node List: * Node master-0 (1): online, feature set 3.19.6 * Node master-1 (2): online, feature set 3.19.6 Full List of Resources: * Clone Set: kubelet-clone [kubelet]: * kubelet (systemd:kubelet): Starting master-0 * kubelet (systemd:kubelet): Starting master-1 * master-0_redfish (stonith:fence_redfish): FAILED master-0 * master-1_redfish (stonith:fence_redfish): FAILED master-1 * Clone Set: etcd-clone [etcd]: * etcd (ocf:heartbeat:podman-etcd): Stopped * etcd (ocf:heartbeat:podman-etcd): Stopped Node Attributes: * Node: master-0 (1): * cluster_id : 18178232792881147661 * member_id : 690b6ec0bad7163e * node_ip : 192.168.111.20 * revision : 2127843 * Node: master-1 (2): * cluster_id : * member_id : 151f26685a878545 * node_ip : 192.168.111.21 * revision : 2127997 Migration Summary: * Node: master-0 (1): * master-0_redfish: migration-threshold=1000000 fail-count=1000000 last-failure='Mon Sep 22 15:49:54 2025' * Node: master-1 (2): * master-1_redfish: migration-threshold=1000000 fail-count=1000000 last-failure='Mon Sep 22 15:49:54 2025' Failed Resource Actions: * master-0_redfish_start_0 on master-0 'error' (1): call=21, status='complete', last-rc-change='Mon Sep 22 15:49:52 2025', queued=0ms, exec=1946ms * master-1_redfish_start_0 on master-1 'error' (1): call=21, status='complete', last-rc-change='Mon Sep 22 15:49:52 2025', queued=0ms, exec=1999ms Tickets: PCSD Status: master-0: Online master-1: Online Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled Pacemaker log: ``` Sep 22 15:53:04 master-0 pacemaker-schedulerd[1751]: warning: Unexpected result (error) was recorded for start of master-0_redfish on master-0 at Sep 22 15:49:52 2025 Sep 22 15:53:04 master-0 pacemaker-schedulerd[1751]: warning: Unexpected result (error) was recorded for start of master-1_redfish on master-0 at Sep 22 15:51:00 2025 Sep 22 15:53:04 master-0 pacemaker-schedulerd[1751]: warning: Unexpected result (error) was recorded for start of master-0_redfish on master-1 at Sep 22 15:51:00 2025 Sep 22 15:53:04 master-0 pacemaker-schedulerd[1751]: warning: Unexpected result (error) was recorded for start of master-1_redfish on master-1 at Sep 22 15:49:52 2025 Sep 22 15:53:04 master-0 pacemaker-schedulerd[1751]: warning: master-0_redfish cannot run on master-0 due to reaching migration threshold (clean up resource to allow again) Sep 22 15:53:04 master-0 pacemaker-schedulerd[1751]: warning: master-1_redfish cannot run on master-0 due to reaching migration threshold (clean up resource to allow again) Sep 22 15:53:04 master-0 pacemaker-schedulerd[1751]: warning: master-0_redfish cannot run on master-1 due to reaching migration threshold (clean up resource to allow again) Sep 22 15:53:04 master-0 pacemaker-schedulerd[1751]: warning: master-1_redfish cannot run on master-1 due to reaching migration threshold (clean up resource to allow again) ``` Both nodes restart. One recovers, and the other joins as a learner. Fencing is down as this point, so I restart the fencing containers used by dev-scripts. ``` [ec2-user@aws-jpoulin-dev ~]$ sudo podman start vbmc vbmc [ec2-user@aws-jpoulin-dev ~]$ sudo podman start sushy-tools sushy-tools ```