Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-74172

CRI-O is frequently segfaulting and restarting due to an attempt to pull the non-existent image

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.19.z
    • Containers
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • 3
    • None
    • None
    • None
    • None
    • None
    • RUN 283, RUN 284
    • 2
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      {}Issue:{}
      CRI-O is frequently segfaulting and restarting due to an attempt to pull the non-existent image causing pods to enter "ImagePullBackOff" and disrupting services. The issue was initially reported on May 2, 2025, and escalated to high severity due to its impact on the development stack and potential for causing problems in production.

      {}Environment{}
      OCP 4.19.21

      {}Actual Issue:{}
      Core dump analysis revealed a "makeslice: len out of range" error, indicating a memory allocation issue within CRI-O when attempting to pull the specific image. The panic is triggered by the `makeslice` function, suggesting that the image's layers or metadata might be causing excessive memory allocation.

      {}Steps so far:{}
      1. Customer reported CRI-O segfaulting due to an attempt to pull a non-existent image, causing pods to enter "ImagePullBackOff" state.
      2. Provided journalctl logs showing CRI-O attempting to access and pull the problematic image.
      3. Engineer requested coredump, sosreport, and additional debugging information.
      4. Core dump analysis confirmed a "makeslice: len out of range" error, indicating a memory allocation issue.
      5. Customer manually pulled the image using `crictl` and `podman`, which also failed with the same error.
      6. Customer confirmed the issue was resolved by restarting the affected pods, though the root cause was not fully understood.
      7. Customer reopened the case on January 20, 2026, reporting the issue persists, indicating a need for resilience in CRI-O's container image layer management.

      {}Next steps:{}
      Customer to provide a sosreport and journal logs from the affected node the next time they face the issue.
      Discuss and implement a feature request to enhance CRI-O's ability to handle such issues, preventing future occurrences and ensuring stability.

      {}Ask{}
      Enhance CRI-O's ability to handle such issues, preventing future occurrences and ensuring stability.

              jkaluza Jan Kaluza
              rhn-support-nchoudhu Novonil Choudhuri
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: