Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: rhel-9.4
Component/s: pacemaker
Labels:
None

Regression:
No
Severity:
Low

Pool Team:

rhel-sst-high-availability
Sub-System Group:

ssg_filesystems_storage_and_HA

Story Points:
1
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Sprint:
None

Preliminary Testing:
None
Test Coverage:
None

Architecture:

x86_64

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

What were you trying to do that didn't work?

monitor failure should've restarted a resource but it was ignored by Pacemaker until resource refresh was manually executed

What is the impact of this issue to you?

Resource automation isn't happening correctly

Please provide the package NVR for which the bug is seen:

How reproducible is this bug?:

Yes

Steps to reproduce

Have two monitoring scripts and give sleep to one of the resource action
Intentionally fail two resources, first fail resource that has 300 seconds sleep then kill second resources
Observe that first resource is still reporting monitoring failure but Pacemaker is ignoring it until I run crm_resource refresh resource manually

```

16389 Sep 16 15:00:11 db2hadr(db2_regress1_regress1_HARA)[110540]: INFO: demote: 1063: regress1: 0: HARA: db2hadr_demote() sleep exit.
16390 Sep 16 15:00:11 db2hadr(db2_regress1_regress1_HARA)[110540]: ERROR: demote: 494: No db2sysc process detected in ps output
16391 Sep 16 15:00:11 db2hadr(db2_regress1_regress1_HARA)[110540]: ERROR: demote: 737: regress1: 0: HARA: Instance is not up. db2hadr_inst ance_monitor() failed with rc=7, db2hadr_monitor() exit with rc=7.
```

Above line is the beginning of failure, we see that monitor reports failure every 10 seconds yet Pacemaker ignores because initial failure was reported 300 seconds ago and was ignored because Pacemaker was busy running db2hadr_promote() action

Expected results

Regardless of how long failure was reported, if monitoring script is reporting failure it should automate it

Actual results

Even though monitoring script is reporting failure Pacemaker still does not automate the resource

pcmk-Wed-18-Sep-2024.tar.bz2
pcmk-Wed-18-Sep-2024-srv-2.tar.bz2

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

pcmk-Thu-17-Oct-2024.tar.bz2
246 kB
2024/10/17 6:19 PM
pcmk-Wed-18-Sep-2024.tar.bz2
7.25 MB
2024/09/18 8:21 PM
pcmk-Wed-18-Sep-2024-srv-2.tar.bz2
7.26 MB
2024/09/18 8:21 PM

Assignee:: Kenneth Gaillot

Reporter:: Dongho Han

Contributors:: Chris Feist, Gerry Sommerville, Lan Pham

Developer:: Kenneth Gaillot

QA Contact:: Cluster QE

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/09/18 8:25 PM

Updated:: 2024/10/31 10:59 PM

Details

Description

What were you trying to do that didn't work?

What is the impact of this issue to you?

Please provide the package NVR for which the bug is seen:

How reproducible is this bug?:

Steps to reproduce

```

Actual results

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates