Loading...

Linking RHIVOS CVEs to...

Migration: Automation ...

SWIFT: POC Conversion

Sync from "Extern...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: rhel-9.6
Affects Version/s: rhel-8.6.0, rhel-8.8.0, rhel-8.10, rhel-9.0.0, rhel-9.2.0, rhel-9.4, rhel-9.6, rhel-10.0
Component/s: resource-agents-sap-hana
Labels:
None

Regression:
Yes
Severity:
Important
Epic Link:
SAPOCP-1330
Keywords:

ZStream, Patch

AssignedTeam:
rhel-sst-sap

Internal Target Milestone:
22
Story Points:
None
Target Version:

rhel-9.6, rhel-10.0
Blocked:
False
Ready:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Products:

Red Hat Enterprise Linux for SAP Solutions
Sprint:
None
Release Blocker:
Approved Blocker
Target Backport Versions:

rhel-8.6.0.z, rhel-8.8.0.z, rhel-8.10.z, rhel-9.0.0.z, rhel-9.2.0.z, rhel-9.4.z

Preliminary Testing:
Requested
Testable Builds:

Hide
https://kojihub.stream.rdu2.redhat.com/kojifiles/work/tasks/2319/4642319/resource-agents-sap-hana-0.162.3-5.el9.noarch.rpm

Show
https://kojihub.stream.rdu2.redhat.com/kojifiles/work/tasks/2319/4642319/resource-agents-sap-hana-0.162.3-5.el9.noarch.rpm
Test Coverage:
None

Experience:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

As reported by Microsoft:

What were you trying to do that didn't work?

[...]

When we try to start the cluster, or node tries to join the cluster after fence, cluster tries to stop SAPHanaTopology resource on the node that is trying to join the cluster. As file systems are managed by cluster, the stop action on SAPHanaTopology fails as hana_shared resource is not started, which result in node fencing. The issue repeats when the node reboots and tries to rejoin the cluster. So, basically it could go in loop.

INVESTIGATION & ROOT CAUSE ON THE ISSUE ****

We believe the problem is that the SAPHanaTopology RA contains two functions with the same name, “sht_monitor()”. Here’s what happens:

[...]

The issue is easily noticeable in SAP HANA scale-up as the filesystem are on NFS share and is managed by cluster.

Please provide the package NVR for which the bug is seen:

As SAPHanaTopology is a resource agent, that comes as part of resource-agents-sap-hana. The impacted RA version is SAPHanaTopologyVersion="0.162.3". Not sure if the issue is present in order SAPHanaToplogy version.

How reproducible is this bug?:

High chances. But sometime things would work depending on the timing. If the cluster is able to mount "hana_shared" by the time SAPHanaTopology stop operation is initiated, then things would be fine.

Steps to reproduce

Configure the SAP HANA scale-up with NFS share as described [High availability of SAP HANA scale-up with Azure NetApp Files on RHEL | Microsoft Learn|https://learn.microsoft.com/en-us/azure/sap/workloads/sap-hana-high-availability-netapp-files-red-hat?tabs=lb-portal] and How do I configure SAP HANA Scale-Up System Replication in a Pacemaker cluster when the HANA filesystems are on NFS shares? - Red Hat Customer Portal.

Stop the cluster on one node and start the cluster again.

Expected results

The expected result is that when cluster is started on the node or the node tries to rejoin the cluster, the probe operation for SAPHanaTopology should return the result as "not running", and second the cluster should not request stop operation of SAPHanaTopology. The first call to SAPHanaTopology should be start operation. As order constraints are defined for start operation, where SAPHanaTopology should only start after NFS mounts resources are started.

is cloned by

RHEL-59661 Fix regression in SAPHanaTopology [rhel-10.0]

Closed

Assignee:: Janine Fuchs

Reporter:: Janine Fuchs

Developer:: Janine Fuchs

QA Contact:: Amir Memon

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2024/09/20 9:46 AM

Updated:: 2024/12/10 4:32 PM

Resolved:: 2024/12/10 4:32 PM

Target end:: 2025/01/20

Details

Description

What were you trying to do that didn't work?

Please provide the package NVR for which the bug is seen:

How reproducible is this bug?:

Steps to reproduce

Expected results

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates