Loading...

Linking RHIVOS CVEs to...

Migration: Automation ...

Sync from "Extern...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: rhel-8.10
Affects Version/s: rhel-8.9.0
Component/s: sanlock
Labels:
None

Fixed in Build:
sanlock-3.8.4-5.el8
Regression:
None
Severity:
Important
AssignedTeam:
rhel-storage-lvm
Sub-System Group:

ssg_filesystems_storage_and_HA

Dev Target Milestone:
20
Internal Target Milestone:
20
Story Points:
2
ACKs Check:

QE ack, Dev ack
Blocked:
False
Ready:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Sprint:
None

Preliminary Testing:
Pass
Errata Link:
https://errata.engineering.redhat.com/advisory/128251
Test Coverage:

RegressionOnly

Experience:
Architecture:

All

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

What were you trying to do that didn't work?

sanlock uses the system watchdog for data protection in shared storage (SAN) environments. If the watchdog does not reset the machine after the configured timeout (60 seconds), then in certain scenarios, two hosts can be accessing the same storage simultaneously, corrupting data. One of the simplest ways in which this can happen is to simply kill the sanlock daemon while it's managing leases for an application.

It was recently discovered that the iTCO_wdt watchdog driver does not reset the machine after the specified timeout (60 seconds), but rather resets the machine after two timeout periods (120 seconds). Therefore, sanlock (via its wdmd daemon) must set the watchdog timeout to half of the necessary value. i.e. the iTCO_wdt timeout needs to be set to 30 seconds in order to have it reset the machine at 60 seconds.

While this appears to be a bug in iTCO_wdt, according to the hardware specifications, it is the intended behavior:

https://uefi.org/sites/default/files/resources/Watchdog%20Descriptor%20Table.pdf

RHEV and LVM use sanlock for data protection on a SAN, and would be exposed to potential data corruption from machines using iTCO_wdt. An insights query reveals nearly 500 RHEL8 systems running sanlock and using iTCO_wdt. There are no known examples of corruption occurring from this problem.

Please provide the package NVR for which bug is seen:

How reproducible:

Steps to reproduce

Expected results

Actual results

links to

RHBA-2024:128251 sanlock update

mentioned on

Merge request - Resolves: RHEL-21814

Assignee:: David Teigland

Reporter:: David Teigland

Developer:: David Teigland

QA Contact:: Cluster QE

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2024/01/16 5:37 PM

Updated:: 2025/08/13 10:01 PM

Resolved:: 2024/05/22 10:19 AM

Dev Target end:: 2024/01/15

Target end:: 2024/01/15

Release Date:: 2024/05/22

Details

Description

What were you trying to do that didn't work?

Please provide the package NVR for which bug is seen:

How reproducible:

Steps to reproduce

Expected results

Actual results

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates