Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-33249

Tracker: RIP: __list_del_entry_valid.cold - "ceph_drop_caps_for_unlink"

          [OCPBUGS-33249] Tracker: RIP: __list_del_entry_valid.cold - "ceph_drop_caps_for_unlink"

          Marking this as closed, there will be no upgrade path from an affected 4.15.z to 4.16 so therefore there's no need for this bug to be noted in 4.16 release notes.

          Scott Dodson added a comment - Marking this as closed, there will be no upgrade path from an affected 4.15.z to 4.16 so therefore there's no need for this bug to be noted in 4.16 release notes.

          Hi rhn-support-sdodson,

          Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

          OpenShift Jira Bot added a comment - Hi rhn-support-sdodson , Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

          Fixed kernel is included in 416.94.202405142341-0 version.

          Adam Piasecki added a comment - Fixed kernel is included in 416.94.202405142341-0 version.

          Bipin Kunal added a comment -

          Thanks a lot for the confirmation rhn-support-sdodson.

          Bipin Kunal added a comment - Thanks a lot for the confirmation rhn-support-sdodson .

          Note, a 9.4 kernel has been built with this but this is not a week where we integrate a 9.4 kernel update. So 4.16.0 nightlies will likely receive the fix next Tuesday or Wednesday.

          Scott Dodson added a comment - Note, a 9.4 kernel has been built with this but this is not a week where we integrate a 9.4 kernel update. So 4.16.0 nightlies will likely receive the fix next Tuesday or Wednesday.

          Yes, the fixed kernel is in 4.12.z through 4.15.z nightlies. Those which are in accepted builds I've moved to VERIFIED, those which are in builds yet to be accepted I've moved to ON_QA but I expect there should be no problem hitting the End Of Day deadlines for 4.12, 4.14, and 4.15.

          Scott Dodson added a comment - Yes, the fixed kernel is in 4.12.z through 4.15.z nightlies. Those which are in accepted builds I've moved to VERIFIED, those which are in builds yet to be accepted I've moved to ON_QA but I expect there should be no problem hitting the End Of Day deadlines for 4.12, 4.14, and 4.15.

          Petr Muller added a comment -

          Based on the impact assessment COS-2781, known issue / conditional risk for this bug was added to the update graph. UpdateRecommendationsBlocked, UpgradeBlocker labels were added to this card. ImpactStatementRequested, ImpactStatementProposed, labels were removed if they were present.

          Details of the conditional risk:

          • Name: CephCapDropPanic
          • Summary: Nodes in clusters running workloads that mount Ceph volumes may experience kernel panics due to a CephFS client bug.

          Petr Muller added a comment - Based on the impact assessment COS-2781 , known issue / conditional risk for this bug was added to the update graph. UpdateRecommendationsBlocked , UpgradeBlocker labels were added to this card. ImpactStatementRequested , ImpactStatementProposed , labels were removed if they were present. Details of the conditional risk: Name: CephCapDropPanic Summary: Nodes in clusters running workloads that mount Ceph volumes may experience kernel panics due to a CephFS client bug.

          Petr Muller added a comment -

          This card has been labeled as a potential upgrade risk with an UpgradeBlock label. We have created a card COS-2781 to help us understand the impact of the bug so that we can warn exposed cluster owners about it before they upgrade to an affected OCP version. The card simply asks for answers to several questions and should not require too much time to answer.

          Petr Muller added a comment - This card has been labeled as a potential upgrade risk with an UpgradeBlock label. We have created a card COS-2781 to help us understand the impact of the bug so that we can warn exposed cluster owners about it before they upgrade to an affected OCP version. The card simply asks for answers to several questions and should not require too much time to answer.

          Scott Dodson added a comment - - edited

          Per rhn-support-mcaldeir we should expect the majority of clusters with Ceph on affected versions to experience problems.

          Affected versions are are 4.15.0+, 4.14.14+, 4.13.36+, 4.12.54+ based on the KCS article saying the regression was introduced in 

          > The issue has been observed in RHEL 8.6 using kernel 4.18.0-372.98.1.el8_6 or higher
          > A similar issue could be observed in RHEL 9.x using kernel kernel-5.14.0-284.54.1.el9_2 or higher

           

          Scott Dodson added a comment - - edited Per rhn-support-mcaldeir we should expect the majority of clusters with Ceph on affected versions to experience problems. Affected versions are are 4.15.0+, 4.14.14+, 4.13.36+, 4.12.54+ based on the KCS article saying the regression was introduced in  > The issue has been observed in RHEL 8.6 using kernel 4.18.0-372.98.1.el8_6  or higher > A similar issue could be observed in RHEL 9.x using kernel kernel-5.14.0-284.54.1.el9_2  or higher  

          Thanks for reporting your issue!

          In order for the CoreOS team to be able to triage your issue, please copy the applicable parts of the following template into a comment and fill them out as completely as possible.


          • OCP Version at Install Time:
          • RHCOS Version at Install Time:
          • OCP Version after Upgrade (if applicable):
          • RHCOS Version after Upgrade (if applicable):
          • Platform (AWS, Azure, bare metal, GCP, vSphere, etc.):
          • Architecture (x86_64, ppc64le, s390x, etc.):

          If you're having problems booting/installing RHCOS, please provide:

          • The full contents of the serial console showing disk initialization, network configuration, and Ignition stages.
            • See this article for information about configuring your serial console.
            • Screenshots or a video recording of the console is usually not sufficient.
          • The full Ignition config (JSON format)

          If you're having problems post-installation or post-upgrade, please provide:

          • An sos report for affected nodes. See this documentation page for instructions on how to gather one.
          • A complete must-gather (oc adm must-gather)

          If you're having SELinux related issues, please provide:

          • The full /var/log/audit/audit.log file
          • Were any SELinux modules or booleans changed from the default configuration?
          • The output of ostree admin config-diff | grep selinux/targeted on impacted nodes

          OpenShift Jira Bot added a comment - Thanks for reporting your issue! In order for the CoreOS team to be able to triage your issue, please copy the applicable parts of the following template into a comment and fill them out as completely as possible. OCP Version at Install Time: RHCOS Version at Install Time: OCP Version after Upgrade (if applicable): RHCOS Version after Upgrade (if applicable): Platform (AWS, Azure, bare metal, GCP, vSphere, etc.): Architecture (x86_64, ppc64le, s390x, etc.): If you're having problems booting/installing RHCOS, please provide: The full contents of the serial console showing disk initialization, network configuration, and Ignition stages. See this article for information about configuring your serial console. Screenshots or a video recording of the console is usually not sufficient. The full Ignition config (JSON format) If you're having problems post-installation or post-upgrade, please provide: An sos report for affected nodes. See this documentation page for instructions on how to gather one. A complete must-gather ( oc adm must-gather ) If you're having SELinux related issues, please provide: The full /var/log/audit/audit.log file Were any SELinux modules or booleans changed from the default configuration? The output of ostree admin config-diff | grep selinux/targeted on impacted nodes

            rhn-support-sdodson Scott Dodson
            rhn-support-sdodson Scott Dodson
            Michael Nguyen Michael Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated:
              Resolved: