-
Story
-
Resolution: Can't Do
-
Undefined
-
None
-
None
-
rhel-sst-upgrades
-
None
-
False
-
-
None
-
None
-
None
-
None
-
None
Goal: Provide a mechanism to check and repair local filesystem problems during upgrade
Example: XFS AGFL corruption detected after upgrading from RHEL7 to RHEL8
Customers upgrading from RHEL7 to RHEL 8 may be exposed to AGFL corruption issues due to a bug in the XFS v5 filesystem format on RHEL7. This can result in a warning being logged to dmesg and a number of filesystem blocks leaked[1].
XFS (dm-6): WARNING: Reset corrupted AGFL on AG 24. 6 blocks leaked. Please unmount and run xfs_repair
When this message is printed the filesystem will correct the problem, leaking a
small number of blocks. However customers seeing this message are concerned that it could be a problem for them causing them to open cases with support.
There is a KCS article[2] for this issue, however the existence of these warnings is problematic, and I don't think we can say definitively that there can never be a problem caused by this issue, even though the probability seems extremely low and no issues have yet been seen in the field.
[1] xfs: detect agfl count corruption and reset agfl
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a27ba2607e60312554cbcd43fc660b2c7f29dc9c
[2] "WARNING: Reset corrupted AGFL" seen in logs on RHEL
https://access.redhat.com/solutions/7049129
Notes: The XFS AGFL problem was not found early enough in the Leapp lifecycle for RHEL7 to RHEL8 to be addressed during normal development, but has recently begun causing a support burden, possibly due to RHEL7 transitioning to EOL. A broader meta-story may be what options do Red Hat support or customers have to address specific issues outside of the main Leapp development cycle.
Risks: Block layer problems should always be resolved before running fsck or repair. For example If a volume is partially assembled by LVM or dmraid. Traditionally LVM refuses to activate a partial volume but this should be taken into account.
Acceptance Criteria
A list of verification conditions, successful functional tests, or expected outcomes in order to declare this story/task successfully completed.
- Upgrade can deal with small problems with local filesystems including XFS, Ext3, Ext4, Fat
- Upgrade can repair the AGFL example above (see KCS solution for synthetic reproducer).
- Upgrade can repair EFI fat problems
- Upgrade can repair ext4 problems
- Upgrade can detect and report on problems (fsck -n, xfs_repair -n).
- relates to
-
RHEL-25699 Adding a warning message for file system check in the `leapp-report.txt`.
- Planning
- links to