Loading...

Linking RHIVOS CVEs to...

Migration: Automation ...

Sync from "Extern...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: rhel-10.0.z
Affects Version/s: rhel-8.9.0.z, rhel-8.10.z, rhel-9.3.0.z, rhel-9.4.z, rhel-9.5.z
Component/s: sap-cluster-connector
Labels:
- cee.next

Fixed in Build:
sap-cluster-connector-3.0.1-10.el10_0.1
Regression:
No
Severity:
Low

AssignedTeam:
rhel-sst-sap

Story Points:
None
Blocked:
False
Ready:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Sprint:
None
Release Blocker:
Approved Blocker
Target Backport Versions:

rhel-8.10, rhel-9.4, rhel-9.5, rhel-9.6

Git Pull Request:
https://github.com/redhat-sap/sap_cluster_connector/pull/1
Preliminary Testing:
Pass
Testable Builds:
https://kojihub.stream.rdu2.redhat.com/kojifiles/work/tasks/657/5430657/sap-cluster-connector-3.0.1-10.el10.noarch.rpm
Errata Link:
https://errata.engineering.redhat.com/advisory/150095
Test Coverage:
None

ProdDocsReview-CCS:
Unspecified
ProdDocsReview-Dev:
Unspecified
ProdDocsReview-QE:
Unspecified

Experience:

PX Impact Score:
PX Review Complete:
SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

What were you trying to do that didn't work?

Customer was trying to operate the cluster as normal, but experienced issues from `sap-cluster-connector` following software updates, and this was no longer detecting maintenance-mode ( MM ).

What is the impact of this issue to you?

Customer uses the below process for stopping / starting SAP system:

issue command "pcs property set maintenance-mode=true" on cluster controller
issue command "sapcontrol -nr 51 -function StopSystem ALL" on node running scs instance
wait for SAP stop with command "sapcontrol -nr 00 -function WaitforStopped 1500 10" on node running scs instance

Normally the resource using "sap_cluster_connector" would detect MM and go hands off during above changes. In this case though, customer describes the following behavior when the resource executes /usr/bin/sap_cluster_connector hook:

SAPInstance resource is disabled - reports the cluster is not in maintenance (when it is showing maintenance)
hangs stopping the SAP system (sapcontrol waits at stopping and eventually times out)

Please provide the package NVR for which the bug is seen:

$ grep -e pacemaker-[0-9] -e sap-cluster-connector */installed-rpms
pacemaker-2.1.7-5.2.el9_4.x86_64                            Tue Feb 18 10:42:54 2025
sap-cluster-connector-3.0.1-7.el9.1.noarch                  Tue Feb 18 10:42:57 2025

How reproducible is this bug?:

100%

Steps to reproduce

Configure an SAP cluster with sap-cluster-connector configured in RHEL 9.2+:
1. How to enable the SAP HA Interface for SAP ABAP application server instances managed by the RHEL HA Add-On?

Place the cluster into maintenance-mode:

$ pcs property set maintenance-mode=true

Run action against the application such as "sapcongrol -nr <##> -function Stop 1500 10" or manually run "sap_cluster_connector" ( lsr option will show MM status ):

[root@clustera-rhel9 ~]# ./sap_cluster_connector --help | grep lsr
      lsr --out FILE --sid SID --ino INO

[root@clustera-rhel9 ~]# ./sap-cluster-connector-3.0.1-8.el9_0 lsr --out output.txt --sid JOS --ino 10

The maintenance-mode status will be reported to "/var/log/messages", and this will report "not in maintenance" for RHEL +9.2 in all cases.

Expected results

We detect maintenance-mode ( example is RHEL 8 ):

$ tail /var/log/messages
-----------------------------------------8<----------------------------------------- 
Feb 19 12:11:40 beaver-11 sap-cluster-connector-3.0.1-8.el9_0[2874733]: lsr call (sid=JOS,ino=10,out=output.txt)
Feb 19 12:11:40 beaver-11 sap-cluster-connector-3.0.1-8.el9_0[2874733]: resource jos_ers10 is in maintanance mode
Feb 19 12:11:40 beaver-11 sap-cluster-connector-3.0.1-8.el9_0[2874733]: check_maintenance_mode call (retcode=0)

Actual results

We do not detect maintenance-mode ( example is RHEL 9.4 ):

$ tail /var/log/messages
-----------------------------------------8<----------------------------------------- 
Feb 18 10:52:33 rhel9-node1 sap_cluster_connector[29552]: Resource JOS_SAP_ASCS00 with InstanceName=JOS_ASCS00_sapjos-scs found Feb 18 10:52:33 rhel9-node1 sap_cluster_connector[29552]: resource JOS_SAP_ASCS00 is not in maintanance mode Feb 18 10:52:33 rhel9-node1 sap_cluster_connector[29552]: resource JOS_SAP_ASCS00 is not in maintanance mode

Additional Notes:

The issue is caused by how the "crm_resource" command represents "maintenance-mode" which appears to change at some point in RHEL 9:

RHEL 8 showed "(unmanaged)" to reflect a cluster in MM:

[root@beaver-1 ~]# crm_resource
Full List of Resources:
  * Resource Group: jos_ASCS00_group (unmanaged):
    * jos_vip_ascs00    (ocf::heartbeat:IPaddr2):     Started (unmanaged)
    * jos_ascs00    (ocf::heartbeat:SAPInstance):     Started (unmanaged)
  * Resource Group: jos_ERS10_group (unmanaged):
    * jos_vip_ers10    (ocf::heartbeat:IPaddr2):     Started (unmanaged)
    * jos_ers10    (ocf::heartbeat:SAPInstance):     Started (unmanaged)
  * Clone Set: sap-filesystems-clone [sap-filesystems] (unmanaged):
    * Resource Group: sap-filesystems:0 (unmanaged):
      * jos_fs_sapmnt    (ocf::heartbeat:Filesystem):     Started (unmanaged)
      * jos_fs_sys    (ocf::heartbeat:Filesystem):     Started (unmanaged)
      * jos_fs_trans    (ocf::heartbeat:Filesystem):     Started (unmanaged)
    * Resource Group: sap-filesystems:1 (unmanaged):
      * jos_fs_sapmnt    (ocf::heartbeat:Filesystem):     Started (unmanaged)
      * jos_fs_sys    (ocf::heartbeat:Filesystem):     Started (unmanaged)
      * jos_fs_trans    (ocf::heartbeat:Filesystem):     Started (unmanaged)

RHEL 9 shows "(maintenance)" to reflect a cluster in MM:

[root@clustera-rhel9 ~]# crm_resource
Full List of Resources:
  * dummy1    (ocf:heartbeat:Dummy):     Started (maintenance)
  * postgres-vip    (ocf:heartbeat:IPaddr2):     Stopped (maintenance)
  * Clone Set: postgresdb-clone [postgresdb] (promotable, maintenance):
    * Stopped: [ clustera-rhel9 clusterb-rhel9 ]
  * ip_10_0_0_50    (ocf:heartbeat:IPaddr2):     Started (maintenance)

The above change to pacemaker behavior is an issue for the `sap-cluster-connector` script as currently the "check_maintenance_mode" function in this script is written to just look for "unmanaged" key word ( in all versions ):

# Check of a few "sap-cluster-connector" versions I had available:
$ grep -n cmd_crm_resource.*unmanaged sap-cluster-connector*
sap-cluster-connector-3.0.1-5.el8:305:    my $retcode = system("$cmd_crm_resource | grep -E \"\[\[:space:\]\]$res\[\[:space:\]\]\" | grep -q unmanaged");
sap-cluster-connector-3.0.1-7.el9.1:305:    my $retcode = system("$cmd_crm_resource | grep -E \"\[\[:space:\]\]$res\[\[:space:\]\]\" | grep -q unmanaged");
sap-cluster-connector-3.0.1-8.el8_6:305:    my $retcode = system("$cmd_crm_resource | grep -E \"\[\[:space:\]\]$res\[\[:space:\]\]\" | grep -q unmanaged");
sap-cluster-connector-3.0.1-8.el9_0:305:    my $retcode = system("$cmd_crm_resource | grep -E \"\[\[:space:\]\]$res\[\[:space:\]\]\" | grep -q unmanaged");

# Full function call:
$ cat sap-cluster-connector-3.0.1-8.el9_0 
299 sub check_maintanance_mode {
300         my ($res) = @_;
301         my $retcode;
302         $nowstring = localtime;
303         printf "%s : check_maintenance_mode($res)\n", $nowstring;
304 
305         my $retcode = system("$cmd_crm_resource | grep -E \"\[\[:space:\]\]$res\[\[:space:\]\]\" | grep -q unmanaged"); #<----
306         $retcode=$retcode >> 8;
307         if ( $retcode eq "0" ) {
308                 syslog("LOG_INFO", "resource %s is in maintanance mode\n", $res);
309         } else {
310                 syslog("LOG_INFO", "resource %s is not in maintanance mode\n", $res);
311         }
312         return $retcode;
313 }

So this "check_maintanance_mode" function within `sap-cluster-connector` likely needs to be expanded to match the change in `pacemaker` status check behaviors where we report `(maintenance)` instead of `(unmanaged)` now. Customer in case 04064873 has noted they have made the below change to their own `sap-cluster-connector` script and this is avoids the issue ( change greps for both "unmanaged" and "maintenance" ):

L305: my $retcode = system("$cmd_crm_resource | grep -E \"\[\[:space:\]\]$res\[\[:space:\]\]\" | grep -E -q \"maintenance|unmanaged\"");

I believe below is the upstream pacemaker commit which changed the report from `(unmanaged)` to `(maintenance)` in status checks.

Feature: libpe_status: crm_mon shows "maintenance" for rsc maint meta · ClusterLabs/pacemaker@87ad7d1

Having trouble pinpointing exactly where this change made it to the RH package, but I would guess the below indicated rebase:

pacemaker-2.1.8-3.el9.x86_64.rpm - Change log

2024-05-22 Chris Lumens <clumens@redhat.com> - 2.1.8-1
- Rebase on upstream 2.1.8-rc1 release 
- Fix a typo in the help output of pacemaker-fenced
- Fix escaping characters in XML attribute output
- Resolves: RHEL-25819
- Resolves: RHEL-30822
2024-03-21 Chris Lumens <clumens@redhat.com> - 2.1.7-5 <--- customer hit issue here
- Fix upgrading to this package on multilib systems
- Resolves: RHEL-28999
2024-01-31 Chris Lumens <clumens@redhat.com> - 2.1.7-4
- Properly validate attribute set type in pacemaker-attrd
- Fix `crm_attribute -t nodes --node localhost`
- Resolves: RHEL-13216
- Resolves: RHEL-17225
- Resolves: RHEL-23498
2024-01-16 Chris Lumens <clumens@redhat.com> - 2.1.7-3 <--- I am guessing MM report change was introduced here
- Rebase on upstream 2.1.7 final release                
- Fix documentation for Pacemaker Remote schema transfers
- Do not check CIB feature set version when CIB_file is set
- Consolidate attrd cache handling
- Avoid duplicating option metadata across daemons
- Related: RHEL-7665
- Related: RHEL-13216
- Resolves: RHEL-7702

links to

RHBA-2025:150095 sap-cluster-connector update

The sap-cluster-connector script no longer detects "maintenance_mode"

Assignee:: Janine Fuchs

Reporter:: Joshua Baker

Developer:: Janine Fuchs

QA Contact:: Amir Memon

Votes:: 1 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2025/02/19 5:30 PM

Updated:: 2025/09/13 4:22 AM

Resolved:: 2025/06/24 5:10 AM

Next Planned Release Date:: 2025/06/24

Release Date:: 2025/06/24

Details

Description

What were you trying to do that didn't work?

What is the impact of this issue to you?

Please provide the package NVR for which the bug is seen:

How reproducible is this bug?:

Steps to reproduce

Expected results

Actual results

Additional Notes:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates