-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
rhos-17.1.4
-
None
-
False
-
-
False
-
?
-
None
-
-
-
PIDONE 18.0.5
-
1
-
Moderate
To Reproduce Steps to reproduce the behavior. We played with timeout value for fence_ipmilan devices in our lab to figure out if they are applied properly and found out the following:
- the default 2 retries (pcmk_monitor_retries) at 120sec timeout = same as CU's config give us failed monitor in 57sec
* ipmi 10s-interval monitor on fastvm-rhel-9-2-78 returned 'error' at Sun Jan 26 17:10:22 2025 after 57.383s
- 3 retries at 120 timeouts result in failure after approx 1min 40sec (that's in my case 3 executions of ipmitool)
* ipmi 10s-interval monitor on fastvm-rhel-9-2-78 returned 'error' at Sun Jan 26 17:12:32 2025 after 1m26.531s
- 4 retries don't change this as the 4th retry is not attempted anymore due to the abovementioned logic of 30% remaining time,
* ipmi 10s-interval monitor on fastvm-rhel-9-2-78 returned 'error' at Sun Jan 26 17:14:41 2025 after 1m26.544s
- but prolonging timeout to 180sec results in another retry going through and the overall leeway of the operation grows close to 2 minutes
* ipmi 10s-interval monitor on fastvm-rhel-9-2-78 returned 'error' at Sun Jan 26 17:17:08 2025 after 1m55.751s
Expected behavior
I'd expect that increasing timeout value for monitor operations will work without extra steps and won't require pcmk_monitor_retries bump. https://issues.redhat.com/browse/OSPRH-13557 may be considered as a related issue