Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-7710

Filesystem: Improve stopping for large filesystems (RHEL7)

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • None
    • None
    • rhel-ha
    • ssg_filesystems_storage_and_HA
    • 5
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • If docs needed, set a value
    • None
    • 57,005

      Description of problem:

      On high-end production workload systems with huge amount of (write-cache) RAM and big XFS file systems >= 8 TiB the unmount operation itself may take longer then 10 minutes on each attempt (even if it fails as processes are still utilizing it). In case login shells of users sit on the Filesystem resource then these do no respond to SIGTERM, just to SIGHUP so when resource is stopping it deliberately fails to unmount and causes stop operation to fail/timeout.

      Version-Release number of selected component (if applicable):

      resource-agents-4.1.1-61.el7_9.15.x86_64

      How reproducible:
      repeatedly

      Steps to Reproduce:
      1. Create large filesystem resource with potentially long dirty unmount cycles (+- 30 minutes) with login shells on it
      2. re-login during the long stop operation (login shells on any HA FS RA managed file system does not fail all standard FS RA stop operation)

      Actual results:
      unmount fails resulting stop operation to fail

      Expected results:
      unmount succeeds

      Additional info:

              rhn-engineering-oalbrigt Oyvind Albrigtsen
              rhn-support-pzimek1 Pepa Zimek
              Oyvind Albrigtsen Oyvind Albrigtsen
              Cluster QE Cluster QE
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: