-
Bug
-
Resolution: Unresolved
-
Undefined
-
rhel-8.6.0, rhel-8.8.0, rhel-8.10, rhel-9.0.0, rhel-9.2.0, rhel-9.4, rhel-9.6, rhel-10.0
-
None
-
resource-agents-sap-hana-0.162.3-5.el10
-
Yes
-
Important
-
Patch
-
rhel-sst-sap
-
15
-
None
-
False
-
-
None
-
Red Hat Enterprise Linux for SAP Solutions
-
None
-
None
As reported by Microsoft:
What were you trying to do that didn't work?
[...]
When we try to start the cluster, or node tries to join the cluster after fence, cluster tries to stop SAPHanaTopology resource on the node that is trying to join the cluster. As file systems are managed by cluster, the stop action on SAPHanaTopology fails as hana_shared resource is not started, which result in node fencing. The issue repeats when the node reboots and tries to rejoin the cluster. So, basically it could go in loop.
- INVESTIGATION & ROOT CAUSE ON THE ISSUE ****
We believe the problem is that the SAPHanaTopology RA contains two functions with the same name, “sht_monitor()”. Here’s what happens:
[...]
The issue is easily noticeable in SAP HANA scale-up as the filesystem are on NFS share and is managed by cluster.
Please provide the package NVR for which the bug is seen:
As SAPHanaTopology is a resource agent, that comes as part of resource-agents-sap-hana. The impacted RA version is SAPHanaTopologyVersion="0.162.3". Not sure if the issue is present in order SAPHanaToplogy version.
How reproducible is this bug?:
High chances. But sometime things would work depending on the timing. If the cluster is able to mount "hana_shared" by the time SAPHanaTopology stop operation is initiated, then things would be fine.
Steps to reproduce
- Configure the SAP HANA scale-up with NFS share as described [High availability of SAP HANA scale-up with Azure NetApp Files on RHEL | Microsoft Learn|https://learn.microsoft.com/en-us/azure/sap/workloads/sap-hana-high-availability-netapp-files-red-hat?tabs=lb-portal] and How do I configure SAP HANA Scale-Up System Replication in a Pacemaker cluster when the HANA filesystems are on NFS shares? - Red Hat Customer Portal.
- Stop the cluster on one node and start the cluster again.
Expected results
The expected result is that when cluster is started on the node or the node tries to rejoin the cluster, the probe operation for SAPHanaTopology should return the result as "not running", and second the cluster should not request stop operation of SAPHanaTopology. The first call to SAPHanaTopology should be start operation. As order constraints are defined for start operation, where SAPHanaTopology should only start after NFS mounts resources are started.
- clones
-
RHEL-59660 Fix regression in SAPHanaTopology [rhel-9.6]
- In Progress
- links to
-
RHBA-2024:139461 resource-agents-sap-hana update