Uploaded image for project: 'OpenShift Network Plumbing'
  1. OpenShift Network Plumbing
  2. NP-805

Several pending pods at once will cause the reconcile cycle to stall

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • openshift-4.12, openshift-4.13, openshift-4.14
    • None

      Description of problem:

      This is a followup of OCPBUGS-16008. We have a solution for the bug, but a better and more efficient solution implementation remains to be done as a separate task. The downside of the current implementation is that having a lot of pending pods at the same time will cause the reconcile cycle to take a long time. 
      
      However, the current implementation still solves pods stuck in pending state, and is overall better than not having a fix. To do things the "proper" way, a non-trivial amount of work needs to be done, so this bug is to track this effort. There are two approaches.
      1. Have a configmap configured by the user so that they can set the cron schedule themselves.
      2. We will need to keep a list of the pending pods in the reconcile looper struct, and retry for them. This would also need to be integrated with the ip-control-loop to sync retries. This is the most correct approach but is probably not doable by the November deadlines.

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      Forcefully reboot a node, then force delete a pod in a stateful set that was created on the same node. 
      
      This causes the pod to be recreated and remain indefinitely in the Pending state.
      
      (This will not reproducible when OCPBUGS-16008 is CLOSED and the associated PR merges - but this is still useful information, because we can safely say that we have broken something if the issue reappears.

              pliurh Peng Liu
              nsimha@redhat.com Nikhil Simha (Inactive)
              Weibin Liang Weibin Liang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: