-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
rhel-9.4
-
None
-
No
-
Low
-
rhel-sst-high-availability
-
ssg_filesystems_storage_and_HA
-
1
-
False
-
-
None
-
None
-
None
-
None
-
x86_64
-
None
What were you trying to do that didn't work?
monitor failure should've restarted a resource but it was ignored by Pacemaker until resource refresh was manually executed
What is the impact of this issue to you?
Resource automation isn't happening correctly
Please provide the package NVR for which the bug is seen:
How reproducible is this bug?:
Yes
Steps to reproduce
- Have two monitoring scripts and give sleep to one of the resource action
- Intentionally fail two resources, first fail resource that has 300 seconds sleep then kill second resources
- Observe that first resource is still reporting monitoring failure but Pacemaker is ignoring it until I run crm_resource refresh resource manually
```
16389 Sep 16 15:00:11 db2hadr(db2_regress1_regress1_HARA)[110540]: INFO: demote: 1063: regress1: 0: HARA: db2hadr_demote() sleep exit.
16390 Sep 16 15:00:11 db2hadr(db2_regress1_regress1_HARA)[110540]: ERROR: demote: 494: No db2sysc process detected in ps output
16391 Sep 16 15:00:11 db2hadr(db2_regress1_regress1_HARA)[110540]: ERROR: demote: 737: regress1: 0: HARA: Instance is not up. db2hadr_inst ance_monitor() failed with rc=7, db2hadr_monitor() exit with rc=7.
```
Above line is the beginning of failure, we see that monitor reports failure every 10 seconds yet Pacemaker ignores because initial failure was reported 300 seconds ago and was ignored because Pacemaker was busy running db2hadr_promote() action
Expected results
Regardless of how long failure was reported, if monitoring script is reporting failure it should automate it
Actual results
Even though monitoring script is reporting failure Pacemaker still does not automate the resource
pcmk-Wed-18-Sep-2024.tar.bz2
pcmk-Wed-18-Sep-2024-srv-2.tar.bz2