Loading...

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: rhel-8.8.0.z
Affects Version/s: None
Component/s: kernel / Storage / Multiple Devices (MD)
Labels:
- batch8
- triaged

Fixed in Build:
kernel-4.18.0-477.57.1.el8_8
Regression:
None
Severity:
Low
Keywords:

ZStream

Pool Team:

rhel-sst-logical-storage
Sub-System Group:

ssg_filesystems_storage_and_HA

Story Points:
3
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Sprint:
None

Preliminary Testing:
Pass
Testable Builds:

Hide
The following Merge Request has pipeline job artifacts available:

Title: Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d" [rhel-8.8.z]
MR: https://gitlab.com/redhat/rhel/src/kernel/rhel-8/-/merge_requests/6393
MR Pipeline: https://gitlab.com/redhat/rhel/src/kernel/rhel-8/-/pipelines/1204855888

The Repo URLs are *not* accessible from a web browser! They only function as a dnf or yum baseurl.

Artifacts expire six weeks after creation. If artifacts are needed after that time please rerun the pipeline by visiting the MR's 'Pipelines' tab and clicking the 'Run pipeline' button.

----------------------------------------

Downstream Pipeline Name: RHEL (rhel8_merge_request)
DataWarehouse Checkout: https://datawarehouse.cki-project.org/kcidb/checkouts/redhat:1204856011
Repo URL: https://s3.upshift.redhat.com/DH-PROD-CKI/internal/1204856011/$basearch/4.18.0-477.51.1.el8_8.6393_1204855888.$basearch
Debug Repo URL: https://s3.upshift.redhat.com/DH-PROD-CKI/internal/1204856011/$basearch-debug/4.18.0-477.51.1.el8_8.6393_1204855888.$basearch

4.18.0-477.51.1.el8_8.6393_1204855888.s390x:
Artifacts (RPMs): https://s3.upshift.redhat.com/DH-PROD-CKI/index.html?prefix=internal-artifacts/1204856011/publish_s390x/6342543691/artifacts/repo/4.18.0-477.51.1.el8_8.6393_1204855888.s390x/

4.18.0-477.51.1.el8_8.6393_1204855888.aarch64:
Artifacts (RPMs): https://s3.upshift.redhat.com/DH-PROD-CKI/index.html?prefix=internal-artifacts/1204856011/publish_aarch64/6342543677/artifacts/repo/4.18.0-477.51.1.el8_8.6393_1204855888.aarch64/

4.18.0-477.51.1.el8_8.6393_1204855888.ppc64le:
Artifacts (RPMs): https://s3.upshift.redhat.com/DH-PROD-CKI/index.html?prefix=internal-artifacts/1204856011/publish_ppc64le/6342543661/artifacts/repo/4.18.0-477.51.1.el8_8.6393_1204855888.ppc64le/

4.18.0-477.51.1.el8_8.6393_1204855888.x86_64-debug:
Artifacts (RPMs): https://s3.upshift.redhat.com/DH-PROD-CKI/index.html?prefix=internal-artifacts/1204856011/publish_x86_64_debug/6342543650/artifacts/repo/4.18.0-477.51.1.el8_8.6393_1204855888.x86_64/

4.18.0-477.51.1.el8_8.6393_1204855888.x86_64:
Artifacts (RPMs): https://s3.upshift.redhat.com/DH-PROD-CKI/index.html?prefix=internal-artifacts/1204856011/publish_x86_64/6342543640/artifacts/repo/4.18.0-477.51.1.el8_8.6393_1204855888.x86_64/

----------------------------------------

Updated 2024-03-07 14:40 UTC by buglinker:
- KWF FAQ: https://red.ht/kernel_workflow_doc
- Slack #team-kernel-workflow: https://redhat-internal.slack.com/archives/C04LRUPMJQ5
- Source: https://gitlab.com/cki-project/kernel-workflow/-/blob/main/webhook/buglinker.py
- Documentation: https://gitlab.com/cki-project/kernel-workflow/-/blob/main/docs/README.buglinker.md
- Report an issue: https://gitlab.com/cki-project/kernel-workflow/-/issues/new?issue%5Btitle%5D=buglinker%20webhook%20issue

Show
The following Merge Request has pipeline job artifacts available: Title: Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d" [rhel-8.8.z] MR: https://gitlab.com/redhat/rhel/src/kernel/rhel-8/-/merge_requests/6393 MR Pipeline: https://gitlab.com/redhat/rhel/src/kernel/rhel-8/-/pipelines/1204855888 The Repo URLs are *not* accessible from a web browser! They only function as a dnf or yum baseurl. Artifacts expire six weeks after creation. If artifacts are needed after that time please rerun the pipeline by visiting the MR's 'Pipelines' tab and clicking the 'Run pipeline' button. ---------------------------------------- Downstream Pipeline Name: RHEL (rhel8_merge_request) DataWarehouse Checkout: https://datawarehouse.cki-project.org/kcidb/checkouts/redhat:1204856011 Repo URL: https://s3.upshift.redhat.com/DH-PROD-CKI/internal/1204856011/$basearch/4.18.0-477.51.1.el8_8.6393_1204855888.$basearch Debug Repo URL: https://s3.upshift.redhat.com/DH-PROD-CKI/internal/1204856011/$basearch-debug/4.18.0-477.51.1.el8_8.6393_1204855888.$basearch 4.18.0-477.51.1.el8_8.6393_1204855888.s390x: Artifacts (RPMs): https://s3.upshift.redhat.com/DH-PROD-CKI/index.html?prefix=internal-artifacts/1204856011/publish_s390x/6342543691/artifacts/repo/4.18.0-477.51.1.el8_8.6393_1204855888.s390x/ 4.18.0-477.51.1.el8_8.6393_1204855888.aarch64: Artifacts (RPMs): https://s3.upshift.redhat.com/DH-PROD-CKI/index.html?prefix=internal-artifacts/1204856011/publish_aarch64/6342543677/artifacts/repo/4.18.0-477.51.1.el8_8.6393_1204855888.aarch64/ 4.18.0-477.51.1.el8_8.6393_1204855888.ppc64le: Artifacts (RPMs): https://s3.upshift.redhat.com/DH-PROD-CKI/index.html?prefix=internal-artifacts/1204856011/publish_ppc64le/6342543661/artifacts/repo/4.18.0-477.51.1.el8_8.6393_1204855888.ppc64le/ 4.18.0-477.51.1.el8_8.6393_1204855888.x86_64-debug: Artifacts (RPMs): https://s3.upshift.redhat.com/DH-PROD-CKI/index.html?prefix=internal-artifacts/1204856011/publish_x86_64_debug/6342543650/artifacts/repo/4.18.0-477.51.1.el8_8.6393_1204855888.x86_64/ 4.18.0-477.51.1.el8_8.6393_1204855888.x86_64: Artifacts (RPMs): https://s3.upshift.redhat.com/DH-PROD-CKI/index.html?prefix=internal-artifacts/1204856011/publish_x86_64/6342543640/artifacts/repo/4.18.0-477.51.1.el8_8.6393_1204855888.x86_64/ ---------------------------------------- Updated 2024-03-07 14:40 UTC by buglinker: - KWF FAQ: https://red.ht/kernel_workflow_doc - Slack #team-kernel-workflow: https://redhat-internal.slack.com/archives/C04LRUPMJQ5 - Source: https://gitlab.com/cki-project/kernel-workflow/-/blob/main/webhook/buglinker.py - Documentation: https://gitlab.com/cki-project/kernel-workflow/-/blob/main/docs/README.buglinker.md - Report an issue: https://gitlab.com/cki-project/kernel-workflow/-/issues/new?issue%5Btitle%5D=buglinker%20webhook%20issue
Errata Link:
https://errata.engineering.redhat.com/advisory/131496
Test Coverage:
None

Experience:

PX Impact Score:
PX Technical Impact:
PX Impact Range:
PX Priority Data:
PX Review Complete:
PX Scheduling Request:
SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

This is a clone of issue RHEL-22698 to use for version rhel-8.8.0.z
–
Original description:
A customer's system has experienced I/O hangs when writing to a raid5 md device. The hang and deadlock occurs when a sync check is also being run against the md device.

At first, the issue looks similar to RHEL-12284 due to the hung sync check being stuck under raid5_get_active_stripe.

crash> ps -m 1076372
[2 05:08:34.428] [UN] PID: 1076372 TASK: ffffa031c545c000 CPU: 4 COMMAND: "md0_resync"
crash> bt 1076372
PID: 1076372 TASK: ffffa031c545c000 CPU: 4 COMMAND: "md0_resync"
#0 [ffffc1f2e2f67be0] __schedule at ffffffff85c0e191
#1 [ffffc1f2e2f67c38] schedule at ffffffff85c0e785
#2 [ffffc1f2e2f67c50] raid5_get_active_stripe at ffffffffc06ba90e [raid456]
#3 [ffffc1f2e2f67cd0] raid5_sync_request at ffffffffc06bf88d [raid456]
#4 [ffffc1f2e2f67d38] md_do_sync.cold.94 at ffffffff85996b8f
#5 [ffffc1f2e2f67eb8] md_thread at ffffffff859887d4
#6 [ffffc1f2e2f67f10] kthread at ffffffff8531e974
#7 [ffffc1f2e2f67f50] ret_from_fork at ffffffff85e0028f

However, this use of raid5_get_active_stripe does not have a stripe_request_ctx and so no batch_last to cause the deadlock in RHEL-12284.

Instead, the main md thread for the device was deadlocked in raid5d.

crash> ps -m 3181
[0 15:43:48.096] [UN] PID: 3181 TASK: ffffa02e2a87c000 CPU: 11 COMMAND: "md0_raid5"
crash> bt 3181
PID: 3181 TASK: ffffa02e2a87c000 CPU: 11 COMMAND: "md0_raid5"
#0 [ffffc1f2ce3e7d48] __schedule at ffffffff85c0e191
#1 [ffffc1f2ce3e7da0] schedule at ffffffff85c0e785
#2 [ffffc1f2ce3e7db8] raid5d at ffffffffc06be7f7 [raid456]
#3 [ffffc1f2ce3e7eb8] md_thread at ffffffff859887d4
#4 [ffffc1f2ce3e7f10] kthread at ffffffff8531e974
#5 [ffffc1f2ce3e7f50] ret_from_fork at ffffffff85e0028f

raid5d was waiting on the MD_SB_CHANGE_PENDING bit, which was set along with bit MD_SB_CHANGE_CLEAN.

crash> struct mddev.ro,sb_flags ffff9fbf8a6e9000
ro = 0,
sb_flags = 6,

These bits were set as there were several tasks which called and waited in md_write_start to transition the device from clean to active. One example:

crash> bt 3261
PID: 3261 TASK: ffffa00050914000 CPU: 7 COMMAND: "xfsaild/dm-9"
#0 [ffffc1f2cdf879e8] __schedule at ffffffff85c0e191
#1 [ffffc1f2cdf87a40] schedule at ffffffff85c0e785
#2 [ffffc1f2cdf87a58] md_write_start at ffffffff8598c72d
#3 [ffffc1f2cdf87ab0] raid5_make_request at ffffffffc06bac63 [raid456]
#4 [ffffc1f2cdf87be0] md_handle_request at ffffffff8598c0ae
#5 [ffffc1f2cdf87c48] md_make_request at ffffffff8598c25b
#6 [ffffc1f2cdf87c68] generic_make_request_no_check at ffffffff856a16e2
#7 [ffffc1f2cdf87cc8] submit_bio at ffffffff856a187c
#8 [ffffc1f2cdf87d08] _xfs_buf_ioapply at ffffffffc096e97f [xfs]
#9 [ffffc1f2cdf87db0] __xfs_buf_submit at ffffffffc09702a3 [xfs]
#10 [ffffc1f2cdf87dd8] xfs_buf_delwri_submit_buffers at ffffffffc097108c [xfs]
#11 [ffffc1f2cdf87e68] xfsaild at ffffffffc09a843f [xfs]
#12 [ffffc1f2cdf87f10] kthread at ffffffff8531e974
#13 [ffffc1f2cdf87f50] ret_from_fork at ffffffff85e0028f

I suspect these md functions were deadlock due to a race condition resulting in a deadlock caused by upstream commit 5e2cf333b7b "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d" which tried to fix a different race condition.

md_write_start will try and shift a clean array to active, setting the MD_SB_CHANGE_PENDING and MD_SB_CHANGE_CLEAN flags seen set on the mddev struct. It wakes up the main thread for the md device, expecting it to run md_check_recovery to transition the device and its array to active.

But this has a race condition with the fix added by commit 5e2cf333b7b.

static void raid5d(struct md_thread *thread)
{
...
if (mddev->sb_flags & ~(1 << MD_SB_CHANGE_PENDING))

{ spin_unlock_irq(&conf->device_lock); md_check_recovery(mddev); spin_lock_irq(&conf->device_lock); /* * Waiting on MD_SB_CHANGE_PENDING below may deadlock * seeing md_check_recovery() is needed to clear * the flag when using mdmon. */ continue; }

wait_event_lock_irq(mddev->sb_wait,
!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags),
conf->device_lock);
...

The commit tries to wait only if MD_SB_CHANGE_PENDING alone is set. If other bits are set, it calls md_check_recovery as desired by md_write_start. However, using separate checks in the if statement and from wait_event_lock_irq can race, allowing the setting of MD_SB_CHANGE_PENDING and MD_SB_CHANGE_CLEAN to occur after the if statement sees a 0 value but before wait_event_lock_irq checks for MD_SB_CHANGE_PENDING. When this race occurs, raid5d will become stuck in a state where it can never run md_check_recovery to clear MD_SB_CHANGE_PENDING. The raid5 code is using MD_SB_CHANGE_PENDING for multiple purposes, and unable to safely distinguish between the different states.

A vmcore of this deadlock is on galvatron-x86.cee.redhat.com at /cores/retrace/tasks/755512355/crash/vmcore

What were you trying to do that didn't work?

Writes may hang and deadlock to a raid5 md device during a sync check.

Please provide the package NVR for which bug is seen:

kernel 4.18.0-513.11.1.el8_9.x86_64

How reproducible:

Occurs randomly for one customer when the weekly raid check cron job runs.

links to

Merge Request: Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d" [rhel-8.8.z]

RAID5 md device hangs during resync

RHSA-2024:131496 kernel update

Details

Description

What were you trying to do that didn't work?

Please provide the package NVR for which bug is seen:

How reproducible:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates