• Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • rhel-9.4
    • pacemaker
    • None
    • No
    • None
    • rhel-sst-high-availability
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • All
    • None

      What were you trying to do that didn't work?

      • Running Pacemaker domain and testing HA, killing resources to check automation.
      • ```

       {{Mar 15 05:18:27 p9dpflnx01 pacemakerd[162767]: crit: pacemaker-schedulerd[162772] is unresponsive to IPC after 5 attempts and will now be killed
      Mar 15 05:18:27 p9dpflnx01 pacemakerd[162767]: notice: Stopping pacemaker-schedulerd
      Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: error: Connection to pengine IPC failed
      Mar 15 05:18:27 p9dpflnx01 pacemakerd[162767]: warning: pacemaker-schedulerd[162772] terminated with signal 9 (Killed)
      Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: error: Connection to pengine closed
      Mar 15 05:18:27 p9dpflnx01 pacemakerd[162767]: notice: Respawning pacemaker-schedulerd subdaemon after unexpected exit
      Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: crit: Lost connection to the scheduler
      Mar 15 05:18:27 p9dpflnx01 pacemaker-schedulerd[3948119]: notice: Starting Pacemaker scheduler
      Mar 15 05:18:27 p9dpflnx01 pacemaker-schedulerd[3948119]: notice: Pacemaker scheduler successfully started and accepting connections
      Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: notice: Saved Cluster Information Base to /var/lib/pacemaker/pengine/pe-core-7890d349-3810-48ea-a0f0-ea44dbe2ad34.bz2 after scheduler crash
      Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: error: Input I_ERROR received in state S_POLICY_ENGINE from save_cib_contents
      Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: warning: State transition S_POLICY_ENGINE -> S_RECOVERY
      Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: warning: Fast-tracking shutdown in response to errors
      Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: warning: Not voting in election, we're in state S_RECOVERY
      Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: error: Input I_TERMINATE received in state S_RECOVERY from do_recover
      Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: notice: Stopped 0 recurring operations at shutdown (11 remaining)
      Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: notice: Recurring action db2_partitionset_jstamko2_0_1_2:16819 (db2_partitionset_jstamko2_0_1_2_monitor_10000) incomplete at shutdown
      Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: error: 1 resource was active at shutdown
      Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: notice: Disconnected from Corosync
      Mar 15 05:18:28 p9dpflnx01 pacemaker-controld[162773]: error: Could not recover from internal error
      Mar 15 05:18:28 p9dpflnx01 pacemakerd[162767]: error: pacemaker-controld[162773] exited with status 1 (Error occurred)
      Mar 15 05:18:28 p9dpflnx01 pacemakerd[162767]: notice: Respawning pacemaker-controld subdaemon after unexpected exit}}
      ```

      I Want to know how to find the root cause of this failure and how to avoid it

      What is the impact of this issue to you?

      • There is an unexpected Pacemaker failure that could cause timing windows where for a short period of time automation is disabled due to Pacemaker process failure

        Please provide the package NVR for which the bug is seen:

        2.1.6-4

        How reproducible is this bug?: Not reproducible easily need to wait until Pacemaker failure

        Steps to reproduce

      1.  
      2.  
      3.  

      Expected results: Pacemaker stays up without error

      Actual results: Pacemaker process gets killed by pacemakerd and gets restarted

      How can I find any logs related to this? Both systemlog ( messages ) and Pacemaker log ( pacemaker.log ) just shows Pacemaker detected internal error and was killed

        1. Larger_PCMK.log
          60.93 MB
        2. pacemaker.log
          578 kB

              rhn-engineering-cfeist Chris Feist
              donghohan@ibm.com Dongho Han
              Chris Feist Chris Feist
              Cluster QE Cluster QE
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: