-
Story
-
Resolution: Unresolved
-
Minor
-
None
-
rhel-9.2.0
-
rhel-sst-filesystems
-
ssg_filesystems_storage_and_HA
-
5
-
False
-
-
None
-
None
-
None
-
None
-
If docs needed, set a value
-
-
Unspecified
-
None
GFS2 currently doesn't allow a glock to be taken by the same task more than once; trying to take the same glock a second time leads to a BUG in add_to_queue(). As a workaround, gfs2_glock_is_locked_by_me() is used in several places to check if the glock is held already, followed by checks like gfs2_glock_is_held_excl().
The problem with that is that gfs2_glock_is_held_excl checks gl->gl_state, so when a glock is held in LM_ST_SHARED or LM_ST_DEFERRED state by the current task, the node may still have the glock cached in LM_ST_EXCLUSIVE state. In that situation, checking for gfs2_glock_is_held_excl() doesn't ensure that the glock will remain locked in LM_ST_EXCLUSIVE state. A possible workaround would be to "upgrade" the current holder (which is returned by gfs2_glock_is_locked_by_me()) to LM_ST_EXCLUSIVE. However, the lock would then remain locked in that state longer than necessary. A better solution would be to recognize the self-recursion and to allow the task to hold the glock a second time. In a LM_FLAG_TRY situation, we could even try to upgrade the glock.
This would reduce the number of gfs2_glock_is_locked_by_me() exceptions in the code. I'm also convinced that gfs2_glock_is_locked_by_me() and gfs2_glock_is_held_*() are used in unsafe ways; reworking that would allow us to properly clean that up.
Examples are the locking in gfs2_update_time() and the retries in gfs2_fault() and gfs2_page_mkwrite() which would sometimes be avoidable.
- external trackers