Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-4825

Pods completed + deleted may leak

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Critical
    • None
    • 4.10.z
    • None

    Description

      Description of problem:

      When a pod runs to a completed state, we typically rely on the update event that will indicate to us that this pod is completed. At that point the pod IP is released and the port configuration is removed in OVN. The subsequent delete event for this pod will be ignored because it should have been cleaned up in the previous update.
      
      However, there can be cases where the update event is missed with pod completed. In this case we will only receive a delete with pod completed event, and ignore tearing down the pod. The end result is the pod is not cleaned up in OVN and the IP address remains allocated, reducing the amount of address range available to launch another pod. This can lead to exhausting all IP addresses available for pod allocation on a node.

      Version-Release number of selected component (if applicable):

      4.10.24

      How reproducible:

      Not sure how to reproduce this. I'm guessing some lag in kapi updates can cause the completed update event and the final delete event to be combined into a single event.

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

      Port still exists in OVN, IP remains allocated for a deleted pod.

      Expected results:

      IP should be freed, port should be removed from OVN.

      Additional info:

       

      Attachments

        Issue Links

          Activity

            People

              trozet@redhat.com Tim Rozet
              trozet@redhat.com Tim Rozet
              Anurag Saxena Anurag Saxena
              Arti Sood
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: