Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-1115

[Upstream] Consider terminating pods in job controller

XMLWordPrintable

    • Icon: Feature Feature
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • None
    • None
    • BU Product Work
    • False
    • Hide

      None

      Show
      None
    • False
    • 0% To Do, 0% In Progress, 100% Done
    • 0

      As an OpenShift administrator

      1. I want to implement a pod failure policy in the job controller so that terminating pods (with a deletionTimestamp) are not immediately replaced and don't count as failed until they reach a terminal phase (Failed or Succeeded). This ensures a more efficient and accurate handling of pod failures.
      2. I want to avoid creating replacement pods for pods that are in the process of terminating but have not yet reached a Failed or Succeeded state so that I can prevent unnecessary resource allocation and align the creation of replacement pods with the pod failure policy.
      3. I want to extend Kubelet to mark pending terminating pods as failed so that the transition of these pods into the Failed phase is clearer and more consistent, enhancing the overall management of pod lifecycles.
      4. I want to add a DisruptionTarget condition for pods preempted by Kubelet to make room for critical pods so that there is better visibility and management of pods that are disrupted for critical workload prioritization.

       

      KEP : https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3939-allow-replacement-when-fully-terminated
      KEP issue: https://github.com/kubernetes/enhancements/issues/3939

              gausingh@redhat.com Gaurav Singh
              gausingh@redhat.com Gaurav Singh
              Wei Sun Wei Sun
              Stephanie Stout Stephanie Stout
              Eric Rich Eric Rich
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: