Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-86266

Pacemaker takes 5+ minutes to elect new DC node after cluster outage

Linking RHIVOS CVEs to...Migration: Automation ...Sync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Undefined Undefined
    • None
    • rhel-9.4
    • pacemaker
    • None
    • No
    • None
    • rhel-ha
    • None
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • None

      What were you trying to do that didn't work?

      During the testing cluster outage, the pacemaker cluster took 5+ minutes to re-elect the DC node, thus automation was paused

      What is the impact of this issue to you?

      Automation is paused for too long after the cluster outage is resolved. This only happens when nodes inside the cluster are rebooted one by one. Once we lose quorum, fencing kicks in and reboots all remaining nodes. When all nodes are rebooted simultaneously DC election happens quickly. Is this expected behaviour?

      Please provide the package NVR for which the bug is seen:

      2.1.6-4

      How reproducible is this bug?:

      It happens every time cluster gets rebooted

      Steps to reproduce

      1. set quorum type to majority
      2. kill nodes one by one until 50% of node dies
      3. automation won't begin until the re-election of the dc node is finished

      Expected results

      • DC node re-election should happen quickly

        Actual results

      • DC node re-election takes long time
        ```

      Automation is paused until DC node election is ongoing
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Fence (reboot) pcmkHost-4 'node is unclean'
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_partitionset_pcmkUser_3_4_5     ( pcmkHost-2 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_partitionset_pcmkUser_6_7_8     ( pcmkHost-3 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_partitionset_pcmkUser_0_1_2     ( pcmkHost-5 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_partition_pcmkUser_0            ( pcmkHost-5 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_partition_pcmkUser_1            ( pcmkHost-5 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_partition_pcmkUser_2            ( pcmkHost-5 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_partition_pcmkUser_3            ( pcmkHost-2 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_partition_pcmkUser_4            ( pcmkHost-2 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_partition_pcmkUser_5            ( pcmkHost-2 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_partition_pcmkUser_6            ( pcmkHost-3 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_partition_pcmkUser_7            ( pcmkHost-3 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_partition_pcmkUser_8            ( pcmkHost-3 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_ethmonitor_pcmkHost-2_enp1s0f0    ( pcmkHost-2 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_mount_data_pcmkUser_NODE0001    ( pcmkHost-5 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_mount_data_pcmkUser_NODE0002    ( pcmkHost-5 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_mount_data_pcmkUser_NODE0000    ( pcmkHost-5 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_mount_data_pcmkUser_NODE0007    ( pcmkHost-3 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_mount_data_pcmkUser_NODE0006    ( pcmkHost-3 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_mount_data_pcmkUser_NODE0008    ( pcmkHost-3 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_mount_data_pcmkUser_NODE0004    ( pcmkHost-2 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_mount_data_pcmkUser_NODE0003    ( pcmkHost-2 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_mount_data_pcmkUser_NODE0005    ( pcmkHost-2 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_mount_logsfs_pcmkUser_NODE0001     ( pcmkHost-5 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_mount_logsfs_pcmkUser_NODE0002     ( pcmkHost-5 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_mount_logsfs_pcmkUser_NODE0000     ( pcmkHost-5 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_mount_logsfs_pcmkUser_NODE0007     ( pcmkHost-3 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_mount_logsfs_pcmkUser_NODE0006     ( pcmkHost-3 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_mount_logsfs_pcmkUser_NODE0008     ( pcmkHost-3 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_mount_logsfs_pcmkUser_NODE0004     ( pcmkHost-2 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_mount_logsfs_pcmkUser_NODE0003     ( pcmkHost-2 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_mount_logsfs_pcmkUser_NODE0005     ( pcmkHost-2 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_ethmonitor_pcmkHost-3_enp1s0f0       ( pcmkHost-3 )
      Apr  3 14:49:23 pcmkHost-3 pacemaker-schedulerd[5742]: notice: Actions: Start      db2_ethmonitor_pcmkHost-5_enp1s0f0       ( pcmkHost-5 )
      Apr  3 14:49:41 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:41 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:41 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:41 pcmkHost-3 pacemaker-attrd[5741]: notice: Recorded new attribute writer: pcmkHost-4 (was pcmkHost-2)
      Apr  3 14:49:41 pcmkHost-3 pacemaker-attrd[5741]: notice: Setting #attrd-protocol[pcmkHost-4] in instance_attributes: (unset) -> 6
      Apr  3 14:49:41 pcmkHost-3 pacemaker-attrd[5741]: notice: Recorded new attribute writer: pcmkHost-2 (was pcmkHost-4)
      Apr  3 14:49:43 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:43 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:43 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:44 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:45 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:45 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:45 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:47 pcmkHost-3 pacemaker-based[5738]: notice: Local CIB 0.409.364.a574d9811fc73e2f60ce4cee3392bd3d differs from pcmkHost-4: 0.409.126.de56cdd9c8d74457ece0ffe3f0e49cbe 0x55685f4a5a70
      Apr  3 14:49:47 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:47 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:47 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:47 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:49 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:49 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:49 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:51 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:51 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:51 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:53 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:54 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:54 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:54 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:56 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:56 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:56 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:58 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:58 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:58 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress
      Apr  3 14:49:59 pcmkHost-3 pacemaker-controld[5743]: warning: Delaying join-1 finalization while transition in progress

      DC node election finally finishes
      Apr 03 14:54:31.129 pcmkHost-3 pacemaker-controld  [5743] (register_fsa_error_adv)        info: Resetting the current action list
      Apr 03 14:54:31.129 pcmkHost-3 pacemaker-controld  [5743] (do_log)        warning: Input I_ELECTION_DC received in state S_FINALIZE_JOIN from finalize_sync_callback
      Apr 03 14:54:31.129 pcmkHost-3 pacemaker-controld  [5743] (do_state_transition)   info: State transition S_FINALIZE_JOIN -> S_INTEGRATION | input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=finalize_sync_callback
      Apr 03 14:54:31.129 pcmkHost-3 pacemaker-controld  [5743] (update_dc)     info: Unset DC (was pcmkHost-3)
      Apr 03 14:54:31.129 pcmkHost-3 pacemaker-controld  [5743] (join_make_offer)       info: Sending join-2 offer to pcmkHost-4
      Apr 03 14:54:31.129 pcmkHost-3 pacemaker-controld  [5743] (join_make_offer)       info: Sending join-2 offer to pcmkHost-2
      Apr 03 14:54:31.129 pcmkHost-3 pacemaker-controld  [5743] (join_make_offer)       info: Sending join-2 offer to pcmkHost-5
      Apr 03 14:54:31.129 pcmkHost-3 pacemaker-controld  [5743] (join_make_offer)       info: Sending join-2 offer to pcmkHost-3
      Apr 03 14:54:31.129 pcmkHost-3 pacemaker-controld  [5743] (do_dc_join_offer_all)  info: Waiting on join-2 requests from 4 outstanding nodes
      Apr 03 14:54:31.130 pcmkHost-3 pacemaker-controld  [5743] (update_dc)     info: Set DC to pcmkHost-3 (3.19.6)

      ```

              rhn-support-clumens Christopher Lumens
              donghohan@ibm.com Dongho Han
              Christopher Lumens Christopher Lumens
              Cluster QE Cluster QE
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: