Uploaded image for project: 'CoreOS OCP'
  1. CoreOS OCP
  2. COS-2705

Impact assesment for OCPBUGS-30096: [4.12][Tracker for RHEL-26706] High Load and Pods Stuck Terminating

XMLWordPrintable

    • Icon: Spike Spike
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • 0
    • 0

      Impact assessment for OCPBUGS-30096

      Which 4.y.z to 4.y'.z' updates increase vulnerability?

      This issue is present in the following releases, and updating to them exposes the cluster:

      • 4.12.49 through 4.12.51
      • 4.11.58

      Which types of clusters?

      • Likely all, or at least clusters which are near the limits of I/O capacity

      What is the impact? Is it serious enough to warrant removing update recommendations?

      • After rebooting into kernel-4.18.0-372.88.1.el8_6 or later kernel nodes experience high load average and io_wait times
      • Nodes may fail to start or stop pods, probes may fail
      • Workload and host processes may become unresponsive and workload may be disrupted

      How involved is remediation?

      • The kernel would need to be overridden to an unaffected version

      Is this a regression?

      • Yes, this is a kernel regression introduced in kernel-4.18.0-372.88.1.el8_6 and as of yet unfixed in 8.6 kernels.

      Note, since OCP 4.11 is EOL we will not ship a subsequent 4.11.z which addresses this, please either apply the workaround or upgrade to 4.12 when a fix becomes available there.

              rhn-support-sdodson Scott Dodson
              pratikam Pratik Mahajan
              Votes:
              1 Vote for this issue
              Watchers:
              23 Start watching this issue

                Created:
                Updated:
                Resolved: