Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-80231

The sap-cluster-connector script no longer detects "maintanance_mode" ( 9.2+ ) [rhel-10]

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • rhel-10.0.z
    • rhel-8.9.0.z, rhel-8.10.z, rhel-9.3.0.z, rhel-9.4.z, rhel-9.5.z
    • sap-cluster-connector
    • sap-cluster-connector-3.0.1-10.el10_0.1
    • No
    • Low
    • rhel-sst-sap
    • None
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • Approved Blocker
    • Unspecified
    • Unspecified
    • Unspecified
    • None

      What were you trying to do that didn't work?

      Customer was trying to operate the cluster as normal, but experienced issues from `sap-cluster-connector` following software updates, and this was no longer detecting maintenance-mode ( MM ).

      What is the impact of this issue to you?

      Customer uses the below process for stopping / starting SAP system:

      • issue command "pcs property set maintenance-mode=true" on cluster controller
      • issue command "sapcontrol -nr 51 -function StopSystem ALL" on node running scs instance
      • wait for SAP stop with command "sapcontrol -nr 00 -function WaitforStopped 1500 10" on node running scs instance

      Normally the resource using "sap_cluster_connector" would detect MM and go hands off during above changes. In this case though, customer describes the following behavior when the resource  executes /usr/bin/sap_cluster_connector hook:

      • SAPInstance resource is disabled - reports the cluster is not in maintenance (when it is showing maintenance)
      • hangs stopping the SAP system (sapcontrol waits at stopping and eventually times out)

      Please provide the package NVR for which the bug is seen:

       

      $ grep -e pacemaker-[0-9] -e sap-cluster-connector */installed-rpms
      pacemaker-2.1.7-5.2.el9_4.x86_64                            Tue Feb 18 10:42:54 2025
      sap-cluster-connector-3.0.1-7.el9.1.noarch                  Tue Feb 18 10:42:57 2025
      

       

      How reproducible is this bug?:

      100%

      Steps to reproduce

       

      1. Configure an SAP cluster with sap-cluster-connector configured in RHEL 9.2+:
        1. How to enable the SAP HA Interface for SAP ABAP application server instances managed by the RHEL HA Add-On?
      2. Place the cluster into maintenance-mode:
        1. $ pcs property set maintenance-mode=true
      1. Run action against the application such as "sapcongrol -nr <##> -function Stop 1500 10" or manually run "sap_cluster_connector" ( lsr option will show MM status ):
        1. [root@clustera-rhel9 ~]# ./sap_cluster_connector --help | grep lsr
                lsr --out FILE --sid SID --ino INO
          
          [root@clustera-rhel9 ~]# ./sap-cluster-connector-3.0.1-8.el9_0 lsr --out output.txt --sid JOS --ino 10

      The maintenance-mode status will be reported to "/var/log/messages", and this will report "not in maintenance" for RHEL +9.2 in all cases. 

       

      Expected results

      We detect maintenance-mode ( example is RHEL 8 ):

       

      $ tail /var/log/messages
      -----------------------------------------8<----------------------------------------- 
      Feb 19 12:11:40 beaver-11 sap-cluster-connector-3.0.1-8.el9_0[2874733]: lsr call (sid=JOS,ino=10,out=output.txt)
      Feb 19 12:11:40 beaver-11 sap-cluster-connector-3.0.1-8.el9_0[2874733]: resource jos_ers10 is in maintanance mode
      Feb 19 12:11:40 beaver-11 sap-cluster-connector-3.0.1-8.el9_0[2874733]: check_maintenance_mode call (retcode=0)
      

      Actual results

      We do not detect maintenance-mode ( example is RHEL 9.4 ):

       

      $ tail /var/log/messages
      -----------------------------------------8<----------------------------------------- 
      Feb 18 10:52:33 rhel9-node1 sap_cluster_connector[29552]: Resource JOS_SAP_ASCS00 with InstanceName=JOS_ASCS00_sapjos-scs found Feb 18 10:52:33 rhel9-node1 sap_cluster_connector[29552]: resource JOS_SAP_ASCS00 is not in maintanance mode Feb 18 10:52:33 rhel9-node1 sap_cluster_connector[29552]: resource JOS_SAP_ASCS00 is not in maintanance mode
      

       

      Additional Notes:

      The issue is caused by how the "crm_resource" command represents "maintenance-mode" which appears to change at some point in RHEL 9:

       

      RHEL 8 showed "(unmanaged)" to reflect a cluster in MM:

       

      [root@beaver-1 ~]# crm_resource
      Full List of Resources:
        * Resource Group: jos_ASCS00_group (unmanaged):
          * jos_vip_ascs00    (ocf::heartbeat:IPaddr2):     Started (unmanaged)
          * jos_ascs00    (ocf::heartbeat:SAPInstance):     Started (unmanaged)
        * Resource Group: jos_ERS10_group (unmanaged):
          * jos_vip_ers10    (ocf::heartbeat:IPaddr2):     Started (unmanaged)
          * jos_ers10    (ocf::heartbeat:SAPInstance):     Started (unmanaged)
        * Clone Set: sap-filesystems-clone [sap-filesystems] (unmanaged):
          * Resource Group: sap-filesystems:0 (unmanaged):
            * jos_fs_sapmnt    (ocf::heartbeat:Filesystem):     Started (unmanaged)
            * jos_fs_sys    (ocf::heartbeat:Filesystem):     Started (unmanaged)
            * jos_fs_trans    (ocf::heartbeat:Filesystem):     Started (unmanaged)
          * Resource Group: sap-filesystems:1 (unmanaged):
            * jos_fs_sapmnt    (ocf::heartbeat:Filesystem):     Started (unmanaged)
            * jos_fs_sys    (ocf::heartbeat:Filesystem):     Started (unmanaged)
            * jos_fs_trans    (ocf::heartbeat:Filesystem):     Started (unmanaged) 
      

       

      RHEL 9 shows "(maintenance)" to reflect a cluster in MM:

      [root@clustera-rhel9 ~]# crm_resource
      Full List of Resources:
        * dummy1    (ocf:heartbeat:Dummy):     Started (maintenance)
        * postgres-vip    (ocf:heartbeat:IPaddr2):     Stopped (maintenance)
        * Clone Set: postgresdb-clone [postgresdb] (promotable, maintenance):
          * Stopped: [ clustera-rhel9 clusterb-rhel9 ]
        * ip_10_0_0_50    (ocf:heartbeat:IPaddr2):     Started (maintenance)

       

      The above change to pacemaker behavior is an issue for the `sap-cluster-connector` script as currently the "check_maintenance_mode" function in this script is written to just look for "unmanaged" key word ( in all versions ):

       

      # Check of a few "sap-cluster-connector" versions I had available:
      $ grep -n cmd_crm_resource.*unmanaged sap-cluster-connector*
      sap-cluster-connector-3.0.1-5.el8:305:    my $retcode = system("$cmd_crm_resource | grep -E \"\[\[:space:\]\]$res\[\[:space:\]\]\" | grep -q unmanaged");
      sap-cluster-connector-3.0.1-7.el9.1:305:    my $retcode = system("$cmd_crm_resource | grep -E \"\[\[:space:\]\]$res\[\[:space:\]\]\" | grep -q unmanaged");
      sap-cluster-connector-3.0.1-8.el8_6:305:    my $retcode = system("$cmd_crm_resource | grep -E \"\[\[:space:\]\]$res\[\[:space:\]\]\" | grep -q unmanaged");
      sap-cluster-connector-3.0.1-8.el9_0:305:    my $retcode = system("$cmd_crm_resource | grep -E \"\[\[:space:\]\]$res\[\[:space:\]\]\" | grep -q unmanaged");

       

      # Full function call:
      $ cat sap-cluster-connector-3.0.1-8.el9_0 
      299 sub check_maintanance_mode {
      300         my ($res) = @_;
      301         my $retcode;
      302         $nowstring = localtime;
      303         printf "%s : check_maintenance_mode($res)\n", $nowstring;
      304 
      305         my $retcode = system("$cmd_crm_resource | grep -E \"\[\[:space:\]\]$res\[\[:space:\]\]\" | grep -q unmanaged"); #<----
      306         $retcode=$retcode >> 8;
      307         if ( $retcode eq "0" ) {
      308                 syslog("LOG_INFO", "resource %s is in maintanance mode\n", $res);
      309         } else {
      310                 syslog("LOG_INFO", "resource %s is not in maintanance mode\n", $res);
      311         }
      312         return $retcode;
      313 }

       

      So this "check_maintanance_mode" function within `sap-cluster-connector` likely needs to be expanded to match the change in `pacemaker` status check behaviors where we report `(maintenance)` instead of `(unmanaged)` now. Customer in case 04064873 has noted they have made the below change to their own `sap-cluster-connector` script and this is avoids the issue ( change greps for both "unmanaged" and "maintenance" ):

       

      L305: my $retcode = system("$cmd_crm_resource | grep -E \"\[\[:space:\]\]$res\[\[:space:\]\]\" | grep -E -q \"maintenance|unmanaged\"");

      I believe below is the upstream pacemaker commit which changed the report from `(unmanaged)` to `(maintenance)` in status checks. 

       

       

      Having trouble pinpointing exactly where this change made it to the RH package, but I would guess the below indicated rebase:

       

      2024-05-22 Chris Lumens <clumens@redhat.com> - 2.1.8-1
      - Rebase on upstream 2.1.8-rc1 release 
      - Fix a typo in the help output of pacemaker-fenced
      - Fix escaping characters in XML attribute output
      - Resolves: RHEL-25819
      - Resolves: RHEL-30822
      2024-03-21 Chris Lumens <clumens@redhat.com> - 2.1.7-5 <--- customer hit issue here
      - Fix upgrading to this package on multilib systems
      - Resolves: RHEL-28999
      2024-01-31 Chris Lumens <clumens@redhat.com> - 2.1.7-4
      - Properly validate attribute set type in pacemaker-attrd
      - Fix `crm_attribute -t nodes --node localhost`
      - Resolves: RHEL-13216
      - Resolves: RHEL-17225
      - Resolves: RHEL-23498
      2024-01-16 Chris Lumens <clumens@redhat.com> - 2.1.7-3 <--- I am guessing MM report change was introduced here
      - Rebase on upstream 2.1.7 final release                
      - Fix documentation for Pacemaker Remote schema transfers
      - Do not check CIB feature set version when CIB_file is set
      - Consolidate attrd cache handling
      - Avoid duplicating option metadata across daemons
      - Related: RHEL-7665
      - Related: RHEL-13216
      - Resolves: RHEL-7702
      

       

              jfuchs@redhat.com Janine Fuchs
              rhn-support-jobaker Joshua Baker
              Janine Fuchs Janine Fuchs
              Amir Memon Amir Memon
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: