Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-8084

[RFE] Saving the core dump file in container itself when containerized process segfaults

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • RHEL CoreOS
    • None
    • Product / Portfolio Work
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      1. Proposed title of this feature request:

      A new tool / utility in OCP which will save the core dump file of segfaulted containerized process but with proper naming format (container + process name). 

      2. What is the nature and description of the request?

      I have a customer who is looking for a feature in OCP - users should be able to map the coredump file to container and pod.  Currently the coredump gets stored on the host OS itself there are two problems with this approach: 

      1. Not every customer has access to the node / Host OS
      2. Customers can not map the coredump file to the container / pod. 

      So there has to be some utility at OCP level so that it would be applied cluster level and users should be able to get their coredump easily and they would be able to reach from coredump file to the actual pod which has crashed. 

      3. Why does the customer need this? (List the business requirements here)

      This customer has hundreds of c++ based processes which are running in hundreds of pods / containers. Their containerized processes often segfaults and as per OCP design the core dump gets captured on the host OS / node. Collecting these coredumps from these node is difficult task for them for various reasons for e.g:

      a) This design either cripples their troubleshooting capabilities or causes friction with their clients who usually own the clusters. 
      b) They can't get the coredump unless they ask the cluster administrator. 
      c) There is risk of OS file system pollution if core files are frequently generated.

      You can visualize the  situation like this - A private, medium‑sized OpenShift cluster (40–50 nodes) hosts about 20 tenant organizations. Each tenant runs multiple C++ workloads, resulting in hundreds of C++ pods overall. Two tenants deploy application patches on the same day. The first patch introduces a defect that causes the application to generate hundreds of core files. Independently, the second tenant’s patch also introduces a defect that produces its own set of core files. Without reliable mechanisms to attribute, isolate, and deliver core files to the correct tenant, the shared core dump accumulation becomes an immediate operational obstacle, delaying root cause analysis for all affected parties. Even if such an event occurs rarely (e.g., once per year), the inability to efficiently map core files to their originating workloads significantly impairs timely troubleshooting in a multi‑tenant environment.

      Now imagine a 2000 node cluster with 500 tenants.

       

      4. List any affected packages or components.

      OCP 4.XX

              rhn-support-mrussell Mark Russell
              rhn-support-ybabar Yogesh Babar
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                None
                None